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NOTES FOR CMOS DEVICES 


@ PRECAUTION AGAINST ESD FOR SEMICONDUCTORS 

Note: 

Strong electric field, when exposed to a MOS device, can cause destruction of the gate oxide and 
ultimately degrade the device operation. Steps must be taken to stop generation of static electricity 
as much as possible, and quickly dissipate it once, when it has occurred. Environmental control 
must be adequate. When it is dry, humidifier should be used. It is recommended to avoid using 
insulators that easily build static electricity. Semiconductor devices must be stored and transported 
in an anti-static container, static shielding bag or conductive material. All test and measurement 
tools including work bench and floor should be grounded. The operator should be grounded using 
wrist strap. Semiconductor devices must not be touched with bare hands. Similar precautions need 
to be taken for PW boards with semiconductor devices on it. 


HANDLING OF UNUSED INPUT PINS FOR CMOS 

Note: 

No connection for CMOS device inputs can be cause of malfunction. If no connection is provided 
to the input pins, itis possible that an internal input level may be generated due to noise, etc., hence 
causing malfunction. CMOS devices behave differently than Bipolar or NMOS devices. Input levels 
of CMOS devices must be fixed high or low by using a pull-up or pull-down circuitry. Each unused 
pin should be connected to Vop or GND with a resistor, if it is considered to have a possibility of 
being an output pin. All handling related to the unused pins must be judged device by device and 
related specifications governing the devices. 


STATUS BEFORE INITIALIZATION OF MOS DEVICES 

Note: 

Power-on does not necessarily define initial status of MOS device. Production process of MOS 
does not define the initial operation status of the device. Immediately after the power source is 


turned ON, the devices with reset function have not yet been initialized. Hence, power-on does 


not guarantee out-pin levels, I/O settings or contents of registers. Device is not initialized until the 
reset signal is received. Reset operation must be executed immediately after power-on for devices 
having reset function. 


Vr4200, Vr4300, VR4400, Vr5000, Vr10000, and Vr Series are trademarks of NEC Corporation. 

R4000 is a trademark of MIPS Computer Systems, Inc. 

MIPS is a registered trademark of MIPS Technologies, Inc. in the United States. 

R4200, R4300, R4400, R5000, and R10000 are trademarks of MIPS Technologies, Inc. 

UNIX is a registered trademark in the United States and other countries, licensed exclusively through 
X/Open Company, Ltd. 


The export of this product from Japan is prohibited without governmental license. To export or re-export this product from 
a country other than Japan may also be prohibited without a license from that country. Please call an NEC sales 
representative. 


Exporting this product or equipment that includes this product may require a governmental license from the U.S.A. for some 
countries because this product utilizes technologies limited by the export control regulations of the U.S.A. 


The information in this document is current as of May, 1997. The information is subject to change 
without notice. For actual design-in, refer to the latest publications of NEC's data sheets or data 
books, etc., for the most up-to-date specifications of NEC semiconductor products. Not all products 
and/or types are available in every country. Please check with an NEC sales representative for 
availability and additional information. 

No part of this document may be copied or reproduced in any form or by any means without prior 

written consent of NEC. NEC assumes no responsibility for any errors that may appear in this document. 

NEC does not assume any liability for infringement of patents, copyrights or other intellectual property rights of 

third parties by or arising from the use of NEC semiconductor products listed in this document or any other 
liability arising from the use of such products. No license, express, implied or otherwise, is granted under any 
patents, copyrights or other intellectual property rights of NEC or others. 

Descriptions of circuits, software and other related information in this document are provided for illustrative 

purposes in semiconductor product operation and application examples. The incorporation of these 

circuits, software and information in the design of customer's equipment shall be done under the full 
responsibility of customer. NEC assumes no responsibility for any losses incurred by customers or third 
parties arising from the use of these circuits, software and information. 

While NEC endeavours to enhance the quality, reliability and safety of NEC semiconductor products, customers 
agree and acknowledge that the possibility of defects thereof cannot be eliminated entirely. To minimize 
risks of damage to property or injury (including death) to persons arising from defects in NEC 
semiconductor products, customers must incorporate sufficient safety measures in their design, such as 
redundancy, fire-containment, and anti-failure features. 

NEC semiconductor products are classified into the following three quality grades: 

"Standard", "Special" and "Specific". The "Specific" quality grade applies only to semiconductor products 

developed based on a customer-designated "quality assurance program" for a specific application. The 
recommended applications of a semiconductor product depend on its quality grade, as indicated below. 

Customers must check the quality grade of each semiconductor product before using it in a particular 

application. 

"Standard": Computers, office equipment, communications equipment, test and measurement equipment, audio 
and visual equipment, home electronic appliances, machine tools, personal electronic equipment 
and industrial robots 

"Special": Transportation equipment (automobiles, trains, ships, etc.), traffic control systems, anti-disaster 
systems, anti-crime systems, safety equipment and medical equipment (not specifically designed 
for life support) 

"Specific": Aircraft, aerospace equipment, submersible repeaters, nuclear reactor control systems, life 
support systems and medical equipment for life support, etc. 

The quality grade of NEC semiconductor products is "Standard" unless otherwise expressly specified in NEC's 
data sheets or data books, etc. If customers wish to use NEC semiconductor products in applications not 
intended by NEC, they must contact an NEC sales representative in advance to determine NEC's willingness 
to support a given application. 

(Note) 

(1) "NEC" as used in this statement means NEC Corporation and also includes its majority-owned subsidiaries. 

(2) "NEC semiconductor products" means any semiconductor product developed or manufactured by or for 

NEC (as defined above). 
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Regional Information 


Some information contained in this document may vary from country to country. Before using any NEC 
product in your application, please contact the NEC office in your country to obtain a list of authorized 
representatives and distributors. They will verify: 


¢ Device availability 
¢ Ordering information 


¢ Product release schedule 


¢ Availability of related technical literature 


¢ Development environment specifications (for example, specifications for third-party tools and 
components, host computers, power plugs, AC supply voltages, and so forth) 


¢ Network requirements 


In addition, trademarks, registered trademarks, export restrictions, and other legal issues may also vary 


from country to country. 
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Readers 


Purpose 


Organization 


How to read this manual 


Legend 


Related Documents 


PREFACE 


This manual targets users who wish to understand the functions of the VR5000 and the 
VR10000 and design application systems using these microprocessors. 


This manual introduces the instruction set of the VR5000 and the VR10000. 
This manual consists of the following contents: 


¢ CPU Instruction set 
¢ FPU Instruction set 


It is assumed that the reader of this manual has general knowledge in the fields of 
electric engineering, logic circuits, and microcomputers. 


The R4200™ in this manual represents the VR4200™. 
The R4300™ in this manual represents the VR4300™. 
The R4400™ in this manual represents the VR4400™. 
The R5000™ in this manual represents the VR5000. 

The R10000™ in this manual represents the VR10000. 


To learn about detailed function of a specific instruction. 
-> Read this manual in sequential order. 


To learn about architecture and hardware functions. 
-> Refer to User’s Manual of each device. 


To learn about electrical specifications. 
-> Refer to Data Sheet of each device. 


Data significance: Higher on left and lower on right 
Active low: XXX* 
Numeric representation: binary ... XXXX or XXXX, 
decimal ... XXXX 
hexadecimal ... OXXXXX 
Prefixes representing an exponent of 2 (for address space or memory capacity): 
K (kilo) 2'° = 1024 
M (mega) 27 = 1024? 
G(giga) 2% = 1024? 
T (tera) 2” = 1024+ 
P (peta) 2° = 1024° 
E (exa) 2° = 1024° 


The related documents indicated here may include preliminary version. However, 
preliminary versions are not marked as such. 


Document Name] Data Sheet User’s Manual 
Product Name Hardware | Architecture | Instruction 


VRS000 U12031E U11761E U12754E 
VR10000 Planned U10278E (This manual) 
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CPU Instruction Set 


1.1 Introduction 


This chapter describes the instruction set architecture (ISA) for the central processing unit 
(CPU) in the MIPS™ ITV architecture. The CPU architecture defines the non-privileged 
instructions that execute in user mode. It does not define privileged instructions providing 
processor control executed by the implementation-specific System Control Processor. 
Instructions for the floating-point unit are described in Chapter 2. 


The original MIPS I CPU ISA has been 
extended in a backward-compatible fashion 
three times. The ISA extensions are inclusive 
as the diagram illustrates; each new 
architecture level (or version) includes the 
former levels. The description of an 
architectural feature includes the architecture 
level in which the feature is (first) defined or 
extended. The feature is also available in all 
later (higher) levels of the architecture. 


MIPS IT 
MIPS II 


MIPS IV 


MIPS Architecture Extensions 


The practical result is that a processor implementing MIPS IV is also able to run MIPS I, 
MIPS II, or MIPS II binary programs without change. 


Chapter 1 CPU Instruction Set 


The CPU instruction set is first summarized by functional group then each instruction is 
described separately in alphabetical order. This manual describe the organization of the 
individual instruction descriptions and the notation used in them (including FPU 
instructions). It concludes with the CPU instruction formats and opcode encoding tables. 


1.2 Functional Instruction Groups 


CPU instructions are divided into the following functional groups: 
¢ Load and Store 
* ALU 
¢ Jump and Branch 
¢ Miscellaneous 


¢  Coprocessor 


1.2.1 Load and Store Instructions 


Load and store instructions transfer data between the memory system and the general 
register sets in the CPU and the coprocessors. There are separate instructions for different 
purposes: transferring various sized fields, treating loaded data as signed or unsigned 
integers, accessing unaligned fields, selecting the addressing mode, and providing atomic 
memory update (read-modify-write). 


Regardless of byte ordering (big- or little-endian), the address of a halfword, word, or 
doubleword is the smallest byte address among the bytes forming the object. For big- 
endian ordering this is the most-significant byte; for a little-endian ordering this is the least- 
significant byte. 


Except for the few specialized instructions listed in Table 1-4, loads and stores must access 
naturally aligned objects. An attempt to load or store an object at an address that is not an 
even multiple of the size of the object will cause an Address Error exception. 


Load and store operations have been added in each revision of the architecture: 
MIPS II 
¢ 64-bit coprocessor transfers 
¢ atomic update 
MIPS III 
¢ 64-bit CPU transfers 
* unsigned word load for CPU 
MIPS IV 


* register + register addressing mode for FPU 


(1) Delayed Loads 


Chapter 1 CPU Instruction Set 


Tables 1-1 and 1-2 tabulate the supported load and store operations and indicate the MIPS 
architecture level at which each operation was first supported. The instructions themselves 
are listed in the following sections. 


Table 1-1 Load/Store Operations Using Register + Offset Addressing Mode 


CPU | coprocessor (except 0) 
Data Size Load Load Store Load Store 
Signed Unsigned 
byte I I I 
halfword I I I 
word I Il I I I 
doubleword Il Il II II 
unaligned word I I 
unaligned doubleword Il I 
linked word II II 
(atomic modify) 
linked doubleword Il Il 


(atomic modify) 


Table 1-2. Load/Store Operations Using Register + Register Addressing Mode 


floating-point coprocessor only 


Data Size Load Store 
word IV IV 
doubleword IV IV 


The MIPS I architecture defines delayed loads; an instruction scheduling restriction 
requires that an instruction immediately following a load into register Rn cannot use Rn as 
a source register. The time between the load instruction and the time the data is available 
is the “load delay slot’. If no useful instruction can be put into the load delay slot, then a 
null operation (assembler mnemonic NOP) must be inserted. 


In MIPS II, this instruction scheduling restriction is removed. Programs will execute 
correctly when the loaded data is used by the instruction following the load, but this may 
require extra real cycles. Most processors cannot actually load data quickly enough for 
immediate use and the processor will be forced to wait until the data is available. 
Scheduling load delay slots is desirable for performance reasons even when it is not 
necessary for correctness. 


(2) CPU Loads and Stores 


Chapter 1 CPU Instruction Set 


There are instructions to transfer different amounts of data: bytes, halfwords, words, and 
doublewords. Signed and unsigned integers of different sizes are supported by loads that 
either sign-extend or zero-extend the data loaded into the register. 


Table 1-3 Normal CPU Load/Store Instructions 


Mnemonic Description Defined in 
LB Load Byte MIPS I 
LBU Load Byte Unsigned I 
SB Store Byte I 
LH Load Halfword I 
LHU Load Halfword Unsigned I 
SH Store Halfword I 
LW Load Word I 
LWU Load Word Unsigned Il 
SW Store Word I 
LD Load Doubleword Il 
SD Store Doubleword Il 


Unaligned words and doublewords can be loaded or stored in only two instructions by 
using a pair of special instructions. The load instructions read the left-side or right-side 
bytes (left or right side of register) from an aligned word and merge them into the correct 
bytes of the destination register. MIPS I, though it prohibits other use of loaded data in the 
load delay slot, permits LWL and LWR instructions targeting the same destination register 
to be executed sequentially. Store instructions select the correct bytes from a source 
register and update only those bytes in an aligned memory word (or doubleword). 


Table 1-4 Unaligned CPU Load/Store Instructions 


Mnemonic Description Defined in 
LWL Load Word Left MIPS I 
LWR Load Word Right I 
SWL Store Word Left I 
SWR Store Word Right I 
LDL Load Doubleword Left Ill 
LDR Load Doubleword Right Il 
SDL Store Doubleword Left Il 


SDR Store Doubleword Right Ill 
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(3) Atomic Update Loads and Stores 


There are paired instructions, Load Linked and Store Conditional, that can be used to 
perform atomic read-modify-write of word and doubleword cached memory locations. 
These instructions are used in carefully coded sequences to provide one of several 
synchronization primitives, including test-and-set, bit-level locks, semaphores, and 
sequencers/event counts. The individual instruction descriptions describe how to use them. 


Table 1-5 Atomic Update CPU Load/Store Instructions 


Mnemonic Description Defined in 
LL Load Linked Word MIPS II 
SC Store Conditional Word Il 
LLD Load Linked Doubleword Ill 
SCD Store Conditional Doubleword Ill 


(4) Coprocessor Loads and Stores 


These loads and stores are coprocessor instructions, however it seems more useful to 
summarize all load and store instructions in one place instead of listing them in the 
coprocessor instructions functional group. 


If a particular coprocessor is not enabled, loads and stores to that processor cannot execute 
and will cause a Coprocessor Unusable exception. Enabling a coprocessor is a privileged 
operation provided by the System Control Coprocessor. 


Table 1-6 Coprocessor Load/Store Instructions 


Mnemonic Description Defined in 
LWCz Load Word to Coprocessor-z MIPS I 
SWCz Store Word from Coprocessor-z I 
LDCz Load Doubleword to Coprocessor-z I 
SDCz Store Doubleword from Coprocessor-z I 


Table 1-7 FPU Load/Store Instructions Using Register + Register Addressing 


Mnemonic Description Defined in 
LWXC1 Load Word Indexed to Floating Point MIPS IV 
SWXCl1 Store Word Indexed from Floating Point IV 
LDXCl1 Load Doubleword Indexed to Floating Point IV 
SDXC1 Store Doubleword Indexed from Floating Point IV 


Chapter 1 CPU Instruction Set 


1.2.2 Computational Instructions 


(1) ALU 


Two’s complement arithmetic is performed on integers represented in two’s complement 
notation. There are signed versions of add, subtract, multiply, and divide. There are add 
and subtract operations, called “unsigned”, that are actually modulo arithmetic without 
overflow detection. There are unsigned versions of multiply and divide. There is a full 
complement of shift and logical operations. 


MIPS I provides 32-bit integers and 32-bit arithmetic. MIPS II adds 64-bit integers and 
provides separate arithmetic and shift instructions for 64-bit operands. Logical operations 
are not sensitive to the width of the register. 


Some arithmetic and logical instructions operate on one operand from a register and the 
other from a 16-bit immediate value in the instruction word. The immediate operand is 
treated as signed for the arithmetic and compare instructions, and treated as logical (zero- 
extended to register length) for the logical instructions. 


Table 1-8 ALU Instructions With an Immediate Operand 


Mnemonic Description Defined in 
ADDI Add Immediate Word MIPS I 
ADDIU Add Immediate Unsigned Word I 
SLTI Set on Less Than Immediate I 
SLTIU Set on Less Than Immediate Unsigned I 
ANDI And Immediate I 
ORI Or Immediate I 
XORI Exclusive Or Immediate I 
LUI Load Upper Immediate I 
DADDI Doubleword Add Immediate Il 
DADDIU Doubleword Add Immediate Unsigned I 
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Table 1-9 3-Operand ALU Instructions 


Mnemonic Description Defined in 
ADD Add Word MIPS I 
ADDU Add Unsigned Word I 
SUB Subtract Word I 
SUBU Subtract Unsigned Word I 
DADD Doubleword Add il 
DADDU Doubleword Add Unsigned I 
DSUB Doubleword Subtract Il 
DSUBU Doubleword Subtract Unsigned Il 
SLT Set on Less Than I 
SLTU Set on Less Than Unsigned I 
AND And I 
OR Or I 
XOR Exclusive Or I 
NOR Nor I 


(2) Shifts 


There are shift instructions that take the shift amount from a 5-bit field in the instruction 
word and shift instructions that take a shift amount from the low-order bits of a general 
register. The instructions with a fixed shift amount are limited to a 5-bit shift count, so 
there are separate instructions for doubleword shifts of 0-31 bits and 32-63 bits. 


Table 1-10 Shift Instructions 


Mnemonic Description Defined in 
SLL Shift Word Left Logical MIPS I 
SRL Shift Word Right Logical I 
SRA Shift Word Right Arithmetic I 
SLLV Shift Word Left Logical Variable I 
SRLV Shift Word Right Logical Variable I 
SRAV Shift Word Right Arithmetic Variable I 
DSLL Doubleword Shift Left Logical Il 
DSRL Doubleword Shift Right Logical Il 
DSRA Doubleword Shift Right Arithmetic Ill 
DSLL32 Doubleword Shift Left Logical + 32 Il 
DSRL32 Doubleword Shift Right Logical + 32 Il 
DSRA32 Doubleword Shift Right Arithmetic + 32 Il 
DSLLV Doubleword Shift Left Logical Variable Il 
DSRLV Doubleword Shift Right Logical Variable Il 


DSRAV Doubleword Shift Right Arithmetic Variable Il 
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(3) Multiply and Divide 


The multiply and divide instructions produce twice as many result bits as is typical with 
other processors and they deliver their results into the HI and LO special registers. Multiply 
produces a full-width product twice the width of the input operands; the low half is put in 
LO and the high half is put in HI. Divide produces both a quotient in LO and a remainder 
in HI. The results are accessed by instructions that transfer data between HI/LO and the 
general registers. 


Table 1-11 Multiply/Divide Instructions 


Mnemonic Description Defined in 
MULT Multiply Word MIPS I 
MULTU Multiply Unsigned Word I 
DIV Divide Word I 
DIVU Divide Unsigned Word I 
DMULT Doubleword Multiply Il 
DMULTU Doubleword Multiply Unsigned Il 
DDIV Doubleword Divide Ill 
DDIVU Doubleword Divide Unsigned I 
MFHI Move From HI I 
MTHI Move To HI I 
MFLO Move From LO I 
MTLO Move To LO I 


1.2.3 Jump and Branch Instructions 


The architecture defines PC-relative conditional branches, a PC-region unconditional 
jump, an absolute (register) unconditional jump, and a similar set of procedure calls that 
record a return link address in a general register. For convenience this discussion refers to 
them all as branches. 


All branches have an architectural delay of one instruction. When a branch is taken, the 
instruction immediately following the branch instruction, in the branch delay slot, is 
executed before the branch to the target instruction takes place. Conditional branches come 
in two versions that treat the instruction in the delay slot differently when the branch is not 
taken and execution falls through. The “branch” instructions execute the instruction in the 
delay slot, but the “branch likely” instructions do not (they are said to nullify it). 


By convention, if an exception or interrupt prevents the completion of an instruction 
occupying a branch delay slot, the instruction stream is continued by re-executing the 
branch instruction. To permit this, branches must be restartable; procedure calls may not 
use the register in which the return link is stored (usually register 31) to determine the 
branch target address. 


Chapter 1 CPU Instruction Set 


Table 1-12. Jump Instructions Jumping Within a 256 Megabyte Region 


Mnemonic Description Defined in 
J Jump MIPS I 
JAL Jump and Link I 


Table 1-13 Jump Instructions to Absolute Address 


Mnemonic Description Defined in 
JR Jump Register MIPS I 
JALR Jump and Link Register I 


Table 1-14. PC-Relative Conditional Branch Instructions Comparing 2 Registers 


Mnemonic Description Defined in 
BEQ Branch on Equal MIPS I 
BNE Branch on Not Equal I 
BLEZ Branch on Less Than or Equal to Zero I 
BGTZ Branch on Greater Than Zero I 
BEQL Branch on Equal Likely Il 
BNEL Branch on Not Equal Likely II 
BLEZL Branch on Less Than or Equal to Zero Likely II 
BGTZL Branch on Greater Than Zero Likely I 


Table 1-15 PC-Relative Conditional Branch Instructions Comparing Against Zero 


Mnemonic Description Defined in 
BLITZ Branch on Less Than Zero MIPS I 
BGEZ Branch on Greater Than or Equal to Zero I 
BLTZAL Branch on Less Than Zero and Link I 
BGEZAL Branch on Greater Than or Equal to Zero and Link I 
BLTZL Branch on Less Than Zero Likely Il 
BGEZL Branch on Greater Than or Equal to Zero Likely I 
BLTZALL _ Branch on Less Than Zero and Link Likely II 
BGEZALL _ Branch on Greater Than or Equal to Zero and Link Likely II 


1.2.4 Miscellaneous Instructions 


(1) Exception Instructions 


Exception instructions have as their sole purpose causing an exception that will transfer 
control to a software exception handler in the kernel. System call and breakpoint 
instructions cause exceptions unconditionally. The trap instructions cause exceptions 
conditionally based upon the result of a comparison. 


Table 1-16 System Call and Breakpoint Instructions 


Mnemonic Description Defined in 
SYSCALL — System Call MIPS I 
BREAK Breakpoint I 
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Table 1-17 Trap-on-Condition Instructions Comparing Two Registers 


Mnemonic Description Defined in 
TGE Trap if Greater Than or Equal MIPS II 
TGEU Trap if Greater Than or Equal Unsigned Il 
TLT Trap if Less Than II 
TLTU Trap if Less Than Unsigned II 
TEQ Trap if Equal II 
TNE Trap if Not Equal II 


Table 1-18 Trap-on-Condition Instructions Comparing an Immediate 


Mnemonic Description Defined in 
TGEI Trap if Greater Than or Equal Immediate MIPS II 
TGEIU Trap if Greater Than or Equal Unsigned Immediate II 
TLTI Trap if Less Than Immediate II 
TLTIU Trap if Less Than Unsigned Immediate Il 
TEQI Trap if Equal Immediate Il 
TNEI Trap if Not Equal Immediate II 


(2) Serialization Instructions 


The order in which memory accesses from load and store instruction appear outside the 
processor executing them, in a multiprocessor system for example, is not specified by the 
architecture. The SYNC instruction creates a point in the executing instruction stream at 
which the relative order of some loads and stores is known. Loads and stores executed 
before the SYNC are completed before loads and stores after the SYNC can start. 


Table 1-19 Serialization Instructions 


Mnemonic Description Defined in 
SYNC Synchronize Shared Memory MIPS II 


(3) Conditional Move Instructions 


Instructions were added in MIPS IV to conditionally move one CPU general register to 
another based on the value in a third general register. 


Table 1-20 CPU Conditional Move Instructions 


Mnemonic Description Defined in 
MOVN Move Conditional on Not Zero MIPS IV 
MOVZ Move Conditional on Zero IV 


(4) Prefetch (R10000 only) 


There are two prefetch advisory instructions; one with registert+offset addressing and the 
other with registert+register addressing. These instructions advise that memory is likely to 
be used in a particular way in the near future and should be prefetched into the cache. The 
PREFX instruction using register+register addressing mode is coded in the FPU opcode 
space along with the other operations using register+register addressing. 
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Table 1-21 Prefetch Using Register + Offset Address Mode 


Mnemonic Description 
PREF Prefetch Indexed 


Defined in 
MIPS IV 


Table 1-22 Prefetch Using Register + Register Address Mode 


Mnemonic 
PREFX 


Description 
Prefetch Indexed 


Defined in 
MIPS IV 


1.2.5 Coprocessor Instructions 


Coprocessors are alternate execution units, with register files separate from the CPU. The 
MIPS architecture provides an abstraction for up to 4 coprocessor units, numbered 0 to 3. 
Each architecture level defines some of these coprocessors as shown in Table 1-23. 
Coprocessor 0 is always used for system control and coprocessor 1 is used for the floating- 
point unit. Other coprocessors are architecturally valid, but do not have a reserved use. 
Some coprocessors are not defined and their opcodes are either reserved or used for other 
purposes. 


Table 1-23 Coprocessor Definition and Use in the MIPS Architecture 


MIPS architecture level 


coprocessor I II Il IV 

0 Sys Control Sys Control Sys Control Sys Control 

1 FPU FPU FPU FPU 

2 unused unused unused unused 

3 unused unused not defined FPU (COP 1X) 


The coprocessors may have two register sets, coprocessor general registers and coprocessor 
control registers, each set containing up to thirty two registers. Coprocessor 
computational instructions may alter registers in either set. 


System control for all MIPS processors is implemented as coprocessor 0 (CPO), the System 
Control Coprocessor. It provides the processor control, memory management, and 
exception handling functions. The CPO instructions are specific to each CPU and are 
documented with the CPU-specific information. 


If a system includes a floating-point unit, it is implemented as coprocessor | (CP1). In 
MIPS IV, the FPU also uses the computation opcode space for coprocessor unit 3, renamed 
COP1X. The FPU instructions are documented in Chapter 2. 


Il 
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The coprocessor instructions are divided into two main groups: 
e Load and store instructions that are reserved in the main opcode space. 


¢  Coprocessor-specific operations that are defined entirely by the coprocessor. 


(1) Coprocessor Load and Store 


Load and store instructions are not defined for CPO; the move to/from coprocessor 
instructions are the only way to write and read the CPO registers. 


The loads and stores for coprocessors are summarized in 1.2.1 Load and Store 
Instructions. 
(2) Coprocessor Operations 


There are up to four coprocessors and the instructions are shown generically for 
coprocessor-z. Within the operation main opcode, the coprocessor has further coprocessor- 
specific instructions encoded. 


Table 1-24 Coprocessor Operation Instructions 


Mnemonic Description Defined in 
COPz Coprocessor-z Operation MIPS I 


1.3 CPO Instructions 
Table 1-25 lists the CPO instructions defined for the R5000 and the R10000 processors. 


Table 1-25. CPO Instructions 


Mnemonic Description Defined in 
CACHE Cache Operation MIPS I 
DMFCO Doubleword Move From CPO MIPS HI 
DMTCO Doubleword Move To CPO MIPS HI 
ERET Exception Return MIPS II 
MFCO Move from CPO MIPS I 
MTCO Move to CPO MIPS I 
TLBP Probe TLB for Matching Entry MIPS I 
TLBR Read Indexed TLB Entry MIPS I 
TLBWI Write Indexed TLB Entry MIPS I 
TLBWR Write Random TLB Entry MIPS I 
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(1) Hazards 
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The R5000 has some instruction hazards and the results of executing certain combinations 
of instructions are unpredictable. For details, see Chapter 3 R5000 Instruction Hazards. 


The R10000 detects most of the pipeline hazards in hardware, including CPO hazards and 
load hazards. No NOP instructions are required to correct instruction sequences. 


(2) Branch on Coprocessor 0 


On the R4400 processor, CacheOps that hit in the specified cache set the CH bit in the 


Diagnostic field of the CPO Status register (bit 18). Though it was undocumented, this bit 


could be tested by the Branch on Coprocessor 0 instructions (BCOT, BCOF, BCOTL, 


BCOFL). 


The R5000 and the R10000 processors also implement the CH bit but it is not associated 
with a Coprocessor 0 condition. Instead, execution of a branch on Coprocessor 0 


instruction takes a Reserved Instruction exception. 


(3) CPO Move Instructions 


The R5000 and the R10000 processors implement Coprocessor 0 move instructions, 
MTCO, MFCO, DMTCO, and DMFC0, exactly the same as in the R4400 processor, even 
though some operations are undefined during certain conditions. The exact operations of 
CPO move instructions on 32/64-bit CPO registers are summarized Table 1-26. 


Table 1-26 CPO Move Instructions 


Instruction 


CPO Register Size 


MIPS 3 Enable? Operation 


MECO rt,rd 
MTCO rtrd 


DMEFCO rt,rd 


DMTCO rt,rd 


The returned value of MFCO/DMFCO from a non-existing CPO register is undefined. 


32 or 64 
32 
64 
32 
64 
32 or 64 
32 
64 
32 or 64 


Don’t care 
Don’t care 
Don’t care 


Yes 


tt <- #05)" | 


td31_0 

rd <- rtz1_9 

rd <- rtg3_.0 

undefined (rt <- 0°|| rd3_ 9) 

It <- rde3._.0 

Reserved Instruction exception 
undefined (rd <- rt3;_¢) 


rd <- Tt63..0 


Reserved Instruction exception. 
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1.4 CACHE Instruction 


(1) Virtual Address 


(2) Physical Address 


(3) CPO Not Usable 
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This section describes the operations of the CACHE instructions in the R5000 and the 
R10000 processors. 


NOTE: The operation of any operation/cache combination not listed below is 
undefined, and the operation of this instruction on uncached addresses is also 
undefined. 


The CACHE instruction uses the following portions of the VA to specify a primary cache 
block and way: 


<R5000> 


¢ VA[13:5] defines a 32-byte block in the primary data or instruction cache 
array. 


¢ VA[14] defines the way needed by Index operations. 
<R10000> 
e VA[13:5] defines a 32-byte block in the primary data cache array. 
¢ VA[13:6] defines a 64-byte block in the primary instruction cache array. 


¢ In both cases, VA[0] defines the way needed by Index operations. 
Since VA[0] is used to indicate the way, it does not cause alignment errors. 


When accessing data in the primary caches, VA[Blocksize-1] is also used to read or write 
a specific word. 


The CACHE instruction uses the following portions of the PA to specify a secondary cache 
block and way: 


<R5000> 


e PA[Size of secondary cache:Block size of secondary cache] is used to 
access the secondary cache. 


<R10000> 


e PA[Size of secondary cache - 2:Blocksize of secondary cache] is used to 
access the secondary cache. 


* PA[O] is used to specify the way needed by Index operations. 
Since PA[0] is used to indicate the way during CACHE Index operations, 
alignment errors are suppressed. 


When accessing data in the secondary cache, PA[Blocksize-1:3] is also used to read or 
write a specific doubleword. 


If the CPO is not usable (if not in Kernel mode, CUO must be set in the Status register for 
CPO to be usable), a Coprocessor Unusable exception is taken. 
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(4) TLB Refill and TLB Invalid Exceptions on CacheOps 


TLB Refill and TLB Invalid exceptions can occur on any operation. For Index operations, 
where the address (virtual address for the primary caches, physical address for the 
secondary cache) is used to index the cache but need not match the cache tag, unmapped 
addresses may be used to avoid TLB exceptions. The operation never causes TLB 
Modified exceptions. 


(5) Hit Operation Accesses 


(6) Watch Exception 


A Hit operation accesses the specified cache as a normal data reference, and performs the 
specified operation if the cache block contains valid data at the specified physical address 
(a hit). 


The operation is undefined if a CacheOp hit occurs in both ways of the cache. 


There is no Watch exception for CacheOps. 


(7) Address Error Exception 


(8) Write Back 


(9) Invalidation 


(10) CE Bit 


During an Index CacheOp, bit 0 is not checked for an Address Error exception since this 
bit is used as the Way indicator bit, and may be non-zero. Bit | of an Index CacheOp can 
still generate an Address Error exception if it is not set to zero. 


For all remaining CacheOps, the low-order two bits of the instruction must be set to zero, 
or else they will generate an Address Error exception. 


A CacheOp is never checked for alignment Address Error exceptions, only for privilege- 
type Address Error exceptions. 


Write back from the primary data cache goes to the secondary cache. Write back from a 
secondary cache always goes to the System interface unit. 


A secondary write back always writes the most recent data; the primary data cache must be 
interrogated, and any dirty inconsistent data written back to the secondary cache before the 
secondary block is written back to the system interface unit. The address to be written is 
specified by the cache tag and not the translated PA. 


When a block is invalidated in the secondary cache, all subset blocks in the primary cache 
are also invalidated. The StateMod bits on invalidated block in the primary data cache are 
set to “001” (Normal) during any invalidation. 


The R5000 and the R10000 processors do not support the CE bit. The functionality of the 
CE bit has been replaced by the Index Load Data and Index Store Data instructions. 
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(11) CH Bit 


The CH bit is supported in the R5000 and the R10000 processors. It is modified by a Hit 
Invalidate (S) or Hit WriteBack Invalidate (S) CACHE instruction. CH is set if there is a 
hit in the secondary cache, and cleared if there is a miss. The CH bit can also be modified 
by a MTCO instruction. 


(12) Serial Operation of CACHE Instructions 


All CACHE instruction variations are performed serially. From the aspect of the primary 
cache, this means CACHE instructions can impede the instruction stream. For this reason, 
load/store speculation is not allowed beyond a CACHE instruction until the CACHE 
instruction has graduated. All load/store accesses, including writebacks to the external 
agent, must be complete before the CACHE instruction can graduate, and any load/store 
following a CACHE instruction cannot be issued speculatively until the CACHE 
instruction graduates. Uncached operations and instruction fetches are not affected. 


(13) Instructions Not Supported 
The processors do not support the following CACHE instructions: 


<R5000> 

¢ Cache Barrier 

e Index Load Data 

e Index Store Data 

e Hit Set Virtual Variations 
<R10000> 

¢ Create DirtyExclusive 

e Hit WriteBack 

e Fill (1) 

e Hit Set Virtual variations 

e Flash 


e Page Invalidate 


(14) Op Field Encoding 
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Table 1-27 presents the Op field encoding for the CACHE instruction. Encodings not listed 
in this table are undefined. 


Table 1-27, CACHE Instruction Op Field Encoding 


Op Field CACHE Instruction Variation Target Cache 
R5000 R10000 
00000 Index Invalidate I 
00100 Index Load Tag I 
01000 Index Store Tag I 
10000 Hit Invalidate I 
10100 Fill Cache Barrier I (Fill) 
11000 Hit Writeback Index Load Data I 
11100 - Index Store Data I 
00001 Index Writeback Invalidate D 
00101 Index Load Tag D 
01001 Index Store Tag D 
01101 Create Dirty Exclusive - D 
10001 Hit Invalidate D 
10101 Hit Writeback Invalidate D 
11001 Hit Writeback Index Load Data D 
11101 - Index Store Data D 
00011 Flash Index Writeback Invalidate S 
00111 Index Load Tag S 
01011 Index Store Tag S 
10011 - Hit Invalidate S 
10111 Page Invalidate Hit Writeback Invalidate S 
11011 - Index Load Data S 
11111 - Index Store Data S 
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1.4.1 Index Invalidate (I) 
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Index Invalidate (I) sets a block in the primary instruction cache to Invalid. VA[13:5] 
(R5000) or VA[13:6] (R10000) defines the address and VA[14] (R5000) or VA[0] 
(R10000) defines the way to be invalidated. 


The invalidation takes place by writing the primary instruction cache state bit to 0 Unvalid). 
This also sets the instruction cache state parity bit to 0. 


The LRU bit (R10000) does not change. 


Parity check is suppressed. 


1.4.2 Index Writeback Invalidate (D) 


Index Writeback Invalidate (D) sets a block in the primary data cache to Invalid. VA[13:5] 
defines the address and VA[14] (R5000) or VA[0] (R10000) defines the way to be 
invalidated. 


The invalidation takes place by writing the following bits: 
* primary data cache state bits are set to 00 (Invalid) 
¢ the SCWay bit is set to 0 (R10000) 
* the StateMod bits = 001 (Normal) (R10000) 
¢ the state parity is set to 0 (R10000). 
The LRU bit (R10000) does not change. 


If the StateMod of the block to be invalidated = 0104 (Inconsistent), the block in the 
primary data cache must be written back to the secondary cache (R10000). 


The address and way in the secondary cache to be written back to are read out of the 
primary data cache tag address and secondary way fields and all 32 bytes are written back 
(R10000). 


Only the data field of the secondary cache is modified by this instruction since the 
processor follows state and data subset rules. 


Since the CE bit is not defined in the R5000 and the R10000 processors, this instruction no 
longer has a CPO ECC register mode. 


1.4.3 Index Writeback Invalidate (S) (R10000 only) 


18 


The Index Writeback Invalidate (S) instruction sets a block in the secondary cache to 
Invalid and writes back any dirty data to the System interface unit. This operation extends 
to any blocks in the primary data or instruction caches which are subsets of the secondary 
cache block. 


The CACHE instruction physical address, PA[Cachesize-2..Blocksize], defines the 
address and PA[0] defines the way to be invalidated. 


1.4.4 Flash (S) (R5000 only) 
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The invalidation occurs in the following sequence: 


ile 


The processor reads the STag, PIdx, and State bits from the secondary cache tag array. 
If State = 00 (/nvalid) no further activity takes place. If there is a valid entry, then the 
STag is used to interrogate the primary instruction and data caches. 


The processor reads each subset block from the primary instruction cache. If ITag = 
STag and IState = 1 (Valid) then the block is invalidated by writing the [State bit to 
0 Uinvalid) and the IState parity bit to 0. 


Read each subset block from the primary data cache. If DTag = STag and DState is 
not equal to 00 U/nvalid), then write the DState bits = 00 (Invalid), the StateMod bits 
= 001 (Normal), the SCWay bit = 0, and the DState parity bit = 0. If the original block 
is DState = 11, (Dirty) and StateMod = 010, (/nconsistent), also write this block back 
to the secondary cache using the DTag and the SCWay bit from the primary data tag 
array. 

Set the state of the secondary cache block to 00 U/nvalid). Since the secondary cache 
is designed so all tag bits must be written at once, the Tag, VA, and ECC bits are also 
written. The tag is written with the PA and VA[13:12] (virtual index) of the original 
CACHE instruction address. The ECC is generated. 


If the secondary cache block’s original State bits were 11, (Dirty), the block is written 
back to the system interface unit. If the block’s State was Shared or CleanExclusive 
the system interface unit is notified with a Tag Invalidation request that the block has 
been deleted. 


The MRU bit is set to point away from the block invalidated unless the line was already 
invalid. 


Flash the entire secondary cache in one operation for tag RAMs which support this 
function. 
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1.4.5 Index Load Tag (I) 
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Index Load Tag (I) reads the primary instruction cache tag fields into the CPO TagLo and 
TagHi registers. VA[13:5] (R5000) or VA[13:6] (R10000) defines the address and VA[14] 
(R5000) or VA[0] (R10000) defines the way of the tag to be read. 


All parity errors caused by Index Load Tag (I) are ignored. 


The following mapping defines the operation: 


<R5000> 
TagLo[0] = Tag parity bit 
TagLo[5:2] = Predecode bits 
TagLo[7:6] = State bits 
TagLo[31:8] = Tag[35:12] 
<R10000> 
TagLo[0] = Tag parity bit 
TagLo[2] = State parity bit 
TagLo[3] = LRU bit 
TagLo[6] = State bit 
TagLo[31:8] = Tag[35:12] 
TagHi[3:0] = Tag[39:36] 


All other CPO TagLo and TagHi bits are set to 0. 


1.4.6 Index Load Tag (D) 
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Index Load Tag (D) reads the primary data cache tag fields into the CPO TagLo and TagHi 
registers. VA[13:5] defines the address and VA[14] (R5000) or VA[0] (R10000) defines 


the way of the tag to be read. 


All parity errors caused by Index Load Tag (D) are ignored. The following mapping 
defines the operation: 


<R5000> 
TagLo[0] = Tag parity bit 
TagLo[7:6] = State bits 
TagLo[31:8] = Tag[35:12] 
<R10000> 
TagLo[0] = Tag parity bit 
TagLo[1] = SCWay 
TagLo[2] = State parity bit 
TagLo[3] = LRU bit 
TagLo[7:6] = State bits 
TagLo[31:8] = Tag[35:12] 
TagHi[3:0] = Tag[39:36] 


TagHi[31:29] = StateMod bits 
All other CPO TagLo and TagHi bits are set to 0. 
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1.4.7 Index Load Tag (S) 
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Index Load Tag (S) reads the secondary cache tag fields into the CPO TagLo and TagHi 
registers. The PA[Cachesize..Blocksize] (R5000) or PA[Cachesize-2..Blocksize] 
(R10000) defines the address and PA[0] (R10000) defines the way to be read. 


All parity and ECC errors caused by Index Load Tag (D) are ignored. 
The following mapping defines the operation: 
<R5000> 

TagLo[9:7] = Virtual index bits 

TagLo[12:10] = State bits 

TagLo[31:13] = Tag[35:17] 


<R10000> 
TagLo[6:0] = Tag ECC bits 
TagLo[8:7] = Virtual index bits 


TagLo[11:10] = State bits 
TagLo[31:14] = Tag[35:18] 
TagHi[3:0] = Tag[39:36] 
TagHi[31] = MRU Bit 
All other CPO TagLo and TagHi register bits are set to 0. 


1.4.8 Index Store Tag (I) 
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Index Store Tag (I) stores the CPO TagLo and TagHi registers into the primary instruction 
cache tag array. VA[13:5] (R5000) or VA[13:6] (R10000) defines the address and VA[14] 


(R5000) or VA[0] (R10000) defines the way of the tag to be written. 


The following mapping defines the operation: 
<R5000> 
Tag parity bit = TagLo[0] 
Predecode bits = TagLo[5:2] 


State bits = TagLo[7:6] 
Tag[35:12] = TagLo[31:8] 
<R10000> 


Tag parity bit = TagLo[0] 
State parity bit = TagLo[2] 


LRU bit = TagLo[3] 
State bit = TagLo[6] 
Tag[35:12] = TagLo[31:8] 
Tag[39:36] = TagHil3:0] 


All the Tag fields, including parity, are directly written. 


Parity check is suppressed for all Index Store Tags. 
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1.4.9 Index Store Tag (D) 
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Index Store Tag (D) stores the CPO TagLo and TagHi registers into the primary data cache 
tag array. VA[13:5] defines the address and VA[14] (R5000) or VA[0] (R10000) defines 
the way of the tag to be written. 


The following mapping defines the operation: 


<R5000> 
Tag parity bit = TagLo[0] 
State bits = TagLo[7:6] 
Tag[35:12] = TagLo[31:8] 
<R10000> 
Tag parity bit = TagLo[0] 
SCWay = TagLo[1] 
State parity bit = TagLo[2] 
LRU bit = TagLo[3] 
State bits = TagLo[7:6] 
Tag[35:12] = TagLo[31:8] 
Tag[39:36] = TagHi[3:0] 


StateMod bits = TagHi[31:29] 
All Tag fields, including parity, are directly written. 


Parity check is suppressed for all Index Store Tags. 


1.4.10 Index Store Tag (S) 
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Index Store Tag (S) stores fields from the CPO TagLo and TagHi registers into the 
secondary cache tag and MRU array fields. The PA[Cachesize..Blocksize] (R5000) or 
PA[Cachesize-2..Blocksize] (R 10000) defines the address and PA[0] (R10000) defines 
the way to be read. 


The following mapping defines the operation: 


<R5000> 
Virtual index bits = TagLo[9:7] 
Status bits = TagLo[12:10] 
Tag[35:17] = TagLo[31:13] 
<R10000> 
Tag ECC bits = TagLo[6:0] 
Virtual index bits = TagLo[8:7] 
Status bits = TagLo[11:10] 
Tag[35:18] = TagLo[31:14] 
Tag[39:36] = TagHi[3:0] 
MRU bit = TagHi[31] 


All Tag fields, including ECC, are directly written. 


Parity check is suppressed for all Index Store Tags. 


1.4.11 Create Dirty Exclusive (D) (R5000 only) 


1.4.12 Hit Invalidate (I) 


This operation is used to avoid loading data needlessly from secondary cache or memory 
when writing new contents into an entire cache block. 


If the cache block does not contain the specified address, and the block is dirty, write it back 
to the secondary cache (if present) and to memory. 


In all cases, set the cache block tag to the specified physical address, set the cache state to 
Dirty Exclusive. 


Hit Invalidate (1) invalidates an entry in the instruction cache which matches the PA of the 
CACHE instruction. Both way tags at VA[13:5] (R5000) or VA[13:6] (R10000) are read 
from the instruction cache. 


If the PState is 1 (Valid), and the PA of the CACHE instruction matches the Tag from the 
instruction cache tag array, the PState bit of the entry is written to 0 (invalid) and the 
PState parity bit is written to 0 (R10000). 


The LRU bit (R10000) does not change. 
Parity error is checked. 


Hit CacheOps can cause cache error exceptions if they check ECC or parity bits. 
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1.4.13 Hit Invalidate (D) 
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Hit Invalidate (D) invalidates an entry in the data cache which matches the PA of the 
CACHE instruction. Both ways tags at VA[13:5] are read from the data cache. 


If the PState is not equal to 00 (Invalid) and the PA of the CACHE instruction matches the 
DTag from the data cache tag array, then the PState bits are written to 00 (Invalid), the 
SCWay bit = 0 (R10000), the StateMod bits = 001, (Normal) (R10000), and the PState 
parity = 0 (R10000). 


The LRU bit (R10000) is left unchanged. 
Parity check is enabled. 


Hit CacheOps can cause cache error exceptions if they check ECC or parity bits. 
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1.4.14 Hit Invalidate (S) (R10000 only) 


Hit Invalidate (S) invalidates all entries in the secondary, primary instruction, and primary 
data caches which match the PA of the CACHE instruction. The following sequence takes 
place: 


1. 


The processor reads the Tags from both ways of the secondary cache at the address 
pointed to by the PA of the CACHE instruction. If the tag entry’s STag matches the 
CACHE instruction PA, and the State of the entry is not equal to 00 (U/nvalid), then a 
Hit has occurred in that entry. If there is no Hit, the CACHE instruction completes. 


The processor checks each entry in the primary caches to determine which corresponds 
to the CACHE instruction PA and the P/dx read from the secondary cache tag array. 
Any entry which matches is invalidated. No write back is required by Hit Invalidate 
(S). 

The processor sets the tag array entry of the secondary cache block which was hit to 
State = 00 (Invalid), Tag = PA of CACHE instruction, and PIdx = VA[13:12] of 
CACHE instruction. 


ECC is generated. 
The MRU bit is written to point to the way opposite to that being invalidated. 


If the processor Eliminate Request mode bit, PrcEImReq, is set, a processor eliminate 
request is sent to notify the external agent that a block in the secondary cache has been 
invalidated. 


Hit Invalidate (S) sets the CH bit if it hits in the secondary cache. 


Once the CH bit is set it stays set until cleared by a MTCO instruction, or the next 
CacheOp that can change the CH bit. 


Hit CacheOps can cause cache error exceptions if they check ECC or parity bits. 


1.4.15 Fill (1) (R5000 only) 


Fill the primary instruction cache block from secondary cache or memory. 


1.4.16 Cache Barrier (R10000 only) 


Cache Barrier does not change any cache fields. It is used when serialization of a CACHE 
instruction is needed without unwanted side effects. For more information, see the section 
titled Serial Operation of CACHE Instructions, in this chapter. 


27 


Chapter 1 CPU Instruction Set 


1.4.17 Hit Writeback Invalidate (D) 


Hit Writeback Invalidate (D) invalidates an entry in the primary data cache which matches 
the PA of the CACHE instruction. In addition, it writes back to the secondary cache any 
DirtyExclusive or Inconsistent data found in the primary data cache. Both way DTags at 
VA[13:5] are read from the data cache. 


If the PState is not equal to 00 (Invalid) and PA of the CACHE instruction matches the 
DTag, then the PState bits of the entry are set to 00 (Invalid), the SCWay is set to 0, the 
PState parity is set to 0 (R10000), and the StateMod bits are set to 0015 (Normal) 
(R10000). 


The LRU bit (R10000) is left unchanged. 


If the state of the block to be invalidated was found to be StateMod = 010, (Inconsistent), 
the block in the primary data cache must be written back to the secondary cache. The 
address and way in the secondary cache to be written back to are read out of the primary 
data cache Tag Address and secondary way fields, and all 32 bytes are written back 
(R10000). 


Only the data field of the secondary cache is modified by this instruction since the 
processor obeys State and data subset rules. 


Since the CE bit is not defined in the R5000 and the R10000 processors, this instruction no 
longer has an ECC register mode. 


Hit CacheOps can cause cache error exceptions if they check ECC or parity bits. 


1.4.18 Hit Writeback Invalidate (S) (R10000 only) 


28 


Hit Writeback Invalidate (S) checks for a block which matches the CACHE instruction PA 
in the secondary cache, invalidates it, and writes back any dirty data to the System interface 
unit. This operation extends to any blocks in the primary data or instruction caches which 
are subsets of the secondary cache block. The operation takes place in the following 
sequence: 


1. The processor reads the STag, PIdx, and State bits from both ways of the secondary 
tag array. 

2. If the PA of the CACHE instruction matches the STag, and the State does not equal 
00 (Invalid), a hit has occurred. If there is a hit, the STag is used to interrogate the 
primary caches. If there is not a hit, the instruction ends. 


3. The processor reads each subset block from the primary instruction cache. If there is a 
match then invalidate the block by writing the [State bit to 0 (invalid) and the IState 
parity bit to 0. 

4. Read each subset block from the primary data cache. If there is a match then write the 
DState bits = 00 (Invalid), the StateMod bits = 001 (Normal), the SC Way bit = 0, and 
the DState parity bit = 0. If the original State of any subset block is StateMod = 010, 
(Inconsistent), also write it back to the secondary cache using the DTag and the 
secondary way bit from the primary data tag array. 
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5. Write the State of the secondary cache block = 00 (Invalid). Since the secondary cache 
is designed so all tag bits must be written at once, the STag, PIdx, and ECC bits are 
also written. The STag is written with whatever the PA and VA[13:12] of the original 
CACHE instruction were. The Tag ECC is generated. 


6. If the secondary block’s original State bits were 11, (Dirty) then the block is written 
back to the system interface unit. If the block’s State was Shared or CleanExclusive 
the system interface unit is simply notified that the block has been deleted with a “Tag 
Invalidation” request. 


7. The MRU bit is set to point away from the block invalidated. 


Hit WriteBack Invalidate (S) set the CH bit if it hits in the secondary cache. Once the CH 
bit is set it stays set until cleared by a MTCO Instruction. 


Hit CacheOps can cause cache error exceptions if they check ECC or parity bits. 


1.4.19 Page Invalidate (S) (R5000 only) 


The processor will generate a page invalidate by doing a burst of 128 line invalidates to the 
secondary cache at the page specified by the effective address generated by the CACHE 
instruction, which must be page aligned. Interrupts are deferred during page invalidates. 


1.4.20 Hit Writeback (I) (R5000 only) 


If the cache block contains the specified address, data is written back unconditionally. 


1.4.21 Hit Writeback (D) (R5000 only) 


If the cache block contains the specified address, and its state is Dirty, write back the data 
and clear the state to not Dirty. 


1.4.22 Index Load Data (I) (R10000 only) 


Index Load Data (J) loads a single instruction from the primary instruction cache into the 
CPO TagHi, TagLo, and ECC registers. A predecoded instruction in R10000 is 36 bits of 
data and one bit of parity. The address of the target instruction is VA[13:2] of the CACHE 
instruction. The way of the target instruction is VA[0] of the CACHE instruction. The 
instruction itself is loaded into CPO TagHi[3:0] and TagLo[31:0]. The parity bit is loaded 
into CPO ECC/0]. The tag field is not read. 


Parity checking is suppressed during operation of Index Load Data (I). 


1.4.23 Index Load Data (D) (R10000 only) 


Index Load Data (D) loads a singleword of data and the corresponding four bits of byte 
parity into CPO TagLo and ECC. The address of the target singleword is VA[13:2] of the 
CACHE instruction. The way of the target singleword is VA[0] of the CACHE instruction. 
The singleword of data will be loaded into the CPO TagLo register. The byte parity will be 
loaded into CPO ECC[3:0] register. The tag field is not read. 


Parity checking is suppressed during operation of Index Load Data (D). 
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1.4.24 Index Load Data (S) (R10000 only) 


Index Load Data (S) loads a doubleword of data and all 10 check bits into the CPO TagHi, 
TagLo, and ECC registers. The address of the target doublewords comes from the PA of 
the CACHE instruction. The way comes from PA[0] of the CACHE instruction. The high 
word will be loaded into CPO TagHi and the low word of data will be loaded into CPO 
TagLo. The check bits will be loaded into CPO ECC/9:0]. The MRU field is unmodified. 


ECC correction and checking is suppressed during Index Load Data (S). 


1.4.25 Index Store Data (I) (R10000 only) 


Index Store Data (I) stores a single instruction into the primary instruction cache. The 
address where this instruction will be written comes from VA[13:2] of the CACHE 
instruction. The way where the data will be written comes from VA[0] of the CACHE 
instruction. The instruction itself comes from CPO TagHi[3:0] and TagLo[31:0]. The parity 
bit is also stored. This comes from CPO ECC/0]. The data to be stored bypasses the 
predecode and is written directly into the instruction cache. The tag field is unmodified. 


1.4.26 Index Store Data (D) (R10000 only) 


Index Store Data (D) stores a word of data and its byte parity into the data cache from the 
CPO TagLo and ECC registers. The address where this word will be written is defined by 
VA[13:2] of the CACHE instruction. The way is defined by VA[0]. The data word comes 
from CPO TagLo. The parity bits come from CPO ECC/[3:0]. The data cache tag array 
including the LRU bit is left unchanged. 


1.4.27 Index Store Data (S) (R10000 only) 
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Index Store Data (S) stores a quadword of data and 10 check bits into the secondary cache 
data array. It stores a doubleword of data from CPO TagHi and TagLo and pads the 
remaining doubleword with zeroes. This allows the ECC and parity, which are based on the 
quadword, to be valid for the doubleword of data stored. The address of the quadword 
stored is defined by the PA of the CACHE instruction, and the way is defined by PA[0]. 
The data stored in the non-padded doubleword comes from CPO TagHi and TagLo. The 
check bits are stored from ECC/9:0]. The tag array including the MRU bit is left 
unchanged. 
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1.5 Defining Access Types 


Access type indicates the size of the R5000 and the R10000 processors data item to be 
loaded or stored, set by the load or store instruction opcode. 


Regardless of access type or byte ordering (endianness), the address given specifies the 
low-order byte in the addressed field. For a big-endian configuration, the low-order byte is 
the most-significant byte; for a little-endian configuration, the low-order byte is the least- 
significant byte. 


The access type, together with the three low-order bits of the address, define the bytes 
accessed within the addressed doubleword (shown in Table 1-28). Only the combinations 
shown in Table 1-28 are permissible; other combinations cause address error exceptions. 


Table 1-28 Byte Access within a Doubleword 


A nm Low Order Bytes Accessed 
ged ane. Address Big endian Little endian 
aaa Bits» |, @Scusoco Lee Oy) see pt then 0) 
(Value) 
2/1) 0 Byte Byte 
Doubleword (7) | 0 | 0 | 0 1;2;3)/4]5 Fi la 5)}4/3)/2]11]0 
: 0/|0}]0 1;2;3/4]5 5|}4/3),2)]1 
Septibyte (6) 
0);0); 1 1/2)3/4)5 7\7 5|}4/3);2]1 
0;0)/0;/0)1)/2])3)4)]5 5|4]3]2]1/]0 
Sextibyte (5) 
0;|1)0 2}3)4/5/6/7})7]}6];/5/4/3}]2 
Guinnvis 0;0/0)/0;1)2)344 4/3)/2/1]0 
uintibyte 
y oO; 141 3} 4 6);7)7/6)]5)]4) 3 
0;0/0)/0}142 43 3}2}1/0 
Word (3) 
1}0]0 6)/7/7}6}5]4 
0/0;0;0)] 1 1] 0 
Triplebyte (2) 0;0;1 1 3 1 
riplebyte 
| 1}01]0 4 
1;0/1 7\|7 
0;0/0]0)1 1] 0 
0} 110 2 3} 2 
Halfword (/) 
1}0]0 4/5 5|4 
1; 11/0 6|7|7]|6 
0/0;0)]0 0 
0;0}1 1 1 
0} 110 2 2 
Byte (0) oO; 1 41 5] 3 
te 
: 1}01]0 4 4 
1;0/1 5 5 
1; 1/0 6 6 
1} 1/1 Tei Sit 
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1.6 Memory Access Types 
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(1) Uncached 


(2) Cached Noncoherent 


(3) Cached Coherent 


(4) Cached 


MIPS systems provide a few memory access types that are characteristic ways to use 
physical memory and caches to perform a memory access. The memory access type is 
specified as a cache coherence algorithm (CCA) in the TLB entry for a mapped virtual 
page. The access type used for a location is associated with the virtual address, not the 
physical address or the instruction making the reference. Implementations without 
multiprocessor (MP) support provide uncached and cached accesses. Implementations 
with MP support provide uncached, cached noncoherent and cached coherent accesses. 
The memory access types use the memory hierarchy as follows: 


Physical memory is used to resolve the access. Each reference causes a read or write to 
physical memory. Caches are neither examined nor modified. 


Physical memory and the caches of the processor performing the access are used to resolve 
the access. Other caches are neither examined nor modified. 


Physical memory and all caches in the system containing a coherent copy of the physical 
location are used to resolve the access. A copy of a location is coherent (noncoherent) if 
the copy was placed in the cache by a cached coherent (cached noncoherent) access. 
Caches containing a coherent copy of the location are examined and/or modified to keep 
the contents of the location coherent. It is unpredictable whether caches holding a 
noncoherent copy of the location are examined and/or modified during a cached coherent 
access. 


For early 32-bit processors without MP support, cached is equivalent to cached 
noncoherent. If an instruction description mentions the cached noncoherent access type, 
the comment applies equally to the cached access type in a processor that has the cached 
access type. 


For processors with MP support, cached is a collective term, e.g. “cached memory” or 
“cached access’, that includes both cached noncoherent and cached coherent. Such a 

collective use does not imply that cached is an access type, it means that the statement 
applies equally to cached noncoherent and cached coherent access types. 
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1.6.1 Mixing References with Different Access Types 


It is possible to have more than one virtual location simultaneously mapped to the same 
physical location. The memory access type used for the virtual mappings may be different, 
but it is not generally possible to use mappings with different access types at the same time. 


A processor executing load and store instructions must observe the effect of the load and 
store instructions to a physical location in the order that they occur in the instruction stream 
(i.e. program order) for all accesses to virtual locations with the same memory access type. 


If a processor executes a load or store using one access type to a physical location, the 
behavior of a subsequent load or store to the same location using a different memory access 
type is undefined unless a privileged instruction sequence is executed between the two 
accesses. Each implementation has a privileged implementation-specific mechanism that 
must be used to change the access type being used to access a location. 


The memory access type of a location affects the behavior of I-fetch, load, store, and 
prefetch operations to the location. In addition, memory access types affect some 
instruction descriptions. Load linked (LL, LLD) and store conditional (SC, SCD) have 
defined operation only for locations with cached memory access type. SYNC affects only 
load and stores made to locations with uncached or cached coherent memory access types. 


1.6.2 Cache Coherence Algorithms and Access Types 


The memory access types are specified by implementation-specific cache coherence 
algorithms (CCAs) in TLB entries. Slightly different cache coherence algorithms such as 
“cached coherent, update on write” and “cached coherent, exclusive on write” can map to 
the same memory access type, in this case they both map to cached coherent. In order to 
map to the same access type the fundamental mechanism of both CCAs must be the same. 
When it affects the operation of the instruction, the instructions are described in terms of 
the memory access types. The load and store operations in a processor proceeds according 
to the specific CCA of the reference, however, and the pseudocode for load and store 
common functions in 1.8.3 (2) Load and Store Memory Functions use the CCA value 
rather than the corresponding memory access type. 


1.6.3 Implementation-Specific Access Types 


An implementation may provide memory access types other than uncached, cached 
noncoherent, or cached coherent. Implementation-specific documentation will define the 
properties of the new access types and their effect on all memory-related operations. 
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1.7 Description of an Instruction 


The CPU instructions are described in alphabetic order. Each description contains several 
sections that contain specific information about the instruction. The content of the section 
is described in detail below. An example description is shown in Figure 1-1. 


Instruction mnemonic 
and descriptive name 


Instruction encoding 
constant and variable 
field names and values 


Architecture level at 
which instruction was 
defined/redefined and 
assembler format(s) 
for each definition 


Short description 
Symbolic description 


Full description of 
instruction operation 


Restrictions on 
instruction and 
operands 


High-level language 
description of 
instruction operation 


Exceptions that 
instruction can cause 


Notes for programmers 


Notes for implementors 


Exampk Instruction Name E XA M P L E 


31 26 25 21 20 16 15 11°10 


SPECIAL a EXAMPLE 
oo0000 oo0go00 oo0g000 


EXAMPLE  rd,rs,rt 
To do an exampleop on 32-bit integers, 


rd rs exampleop rt 


This section describes the operation of the instruction in text, tables, and pictures. It will 
include information about the instruction that is hard to encode in the Operation section so 
they rust be used together to completely understand the instruction. 


Restrictions: 
This section lists any restrictions for the instruction. Things which may be restricted 
include the values of instruction encoding fields such as register specifiers, operand values, 


operand formats, address alignment, instruction scheduling hazards, and type of memory 
access for addressed locations. 


Operation: 
/* This section describes the operation of the instruction in a high-level language. 
* It is precise in ways that the Description section is not, but itis also missing 
* information that is hard to express in pseudocode.*/ 
ternp —GPR[rs] exampleop GPR[r] 
GPR[rd] -sign_extend(temps, 9) 


xceptions: 
Alist of the exceptions taken by the instruction, such as 
Integer Overflow 


Programming Notes: 


This contains information useful for programmers, but not necessary to describe the 
operation of the instruction. 


Mmplementation Notes: 


Like Programming Notes, except for processor implementors. 


Figure 1-1 Example Instruction Description 


1.7.1 Instruction Mnemonic and Name 


The instruction mnemonic and name are printed as page headings for each page in the 


instruction description. 
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1.7.2 Instruction Encoding Picture 


1.7.3 Format 


1.7.4 Purpose 


1.7.5 Description 


The instruction word encoding is shown in pictorial form at the top of the instruction 
description. This picture shows the values of all constant fields and the opcode names for 
opcode fields in upper-case. It labels all variable fields with lower-case names that are used 
in the instruction description. Fields that contain zeroes but are not named are unused fields 
that are required to be zero. A summary of the instruction formats and a definition of the 
terms used to describe the contents can be found in 1.10 CPU Instruction Formats. 


The assembler formats for the instruction and the architecture level at which the instruction 
was originally defined are shown. If the instruction definition was later extended, the 
architecture levels at which it was extended and the assembler formats for the extended 
definition are shown in order of extension. The MIPS architecture levels are inclusive; 
higher architecture levels include all instructions in previous levels. Extensions to 
instructions are backwards compatible. The original assembler formats are valid for the 
extended architecture. 


The assembler format is shown with literal parts of the assembler instruction in upper-case 
characters. The variable parts, the operands, are shown as the lower-case names of the 
appropriate fields in the instruction encoding picture. The architecture level at which the 
instruction was first defined, e.g. “MIPS I’, is shown at the right side of the page. 


There can be more than one assembler format per architecture level. This is sometimes an 
alternate form of the instruction. Floating-point operations on formatted data show an 
assembly format with the actual assembler mnemonic for each valid value of the “fmt” 
field. For example the ADD.fmt instruction shows ADD.S and ADD.D. 


The assembler format lines sometimes have comments to the right in parentheses to help 
explain variations in the formats. The comments are not a part of the assembler format. 


This is a very short statement of the purpose of the instruction. 


If a one-line symbolic description of the instruction is feasible, it will appear immediately 
to the right of the Description heading. The main purpose is to show how fields in the 
instruction are used in the arithmetic or logical operation. 


The body of the section is a description of the operation of the instruction in text, tables, 
and figures. This description complements the high-level language description in the 
Operation section. 


This section uses acronyms for register descriptions. “GPR rf’ is CPU General Purpose 
Register specified by the instruction field rt. “FPR fs” is the Floating Point Operand 
Register specified by the instruction field fs. “CP1 register fd” is the coprocessor 1 General 
Register specified by the instruction field fd. “FCSR” is the floating-point control and 
status register. 
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1.7.6 Restrictions 


1.7.7 Operation 


1.7.8 Exceptions 
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This section documents the restrictions on the instruction. Most restrictions fall into one of 
six categories: 


e The valid values for instruction fields (see floating-point ADD.fmt). 
e The alignment requirements for memory addresses (see LW). 

e The valid values of operands (see DADD). 

¢ The valid operand formats (see floating-point ADD.fmt). 


e The order of instructions necessary to guarantee correct execution. These 
ordering constraints avoid pipeline hazards for which some processors do not 
have hardware interlocks (see MUL). 


¢ The valid memory access types (see LL/SC). 


This section describes the operation of the instruction as pseudocode in a high-level 
language notation resembling Pascal. The purpose of this section is to describe the 
operation of the instruction clearly in a form with less ambiguity than prose. This formal 
description complements the Description section; it is not complete in itself because many 
of the restrictions are either difficult to include in the pseudocode or omitted for readability. 


There will be separate Operation sections for 32-bit and 64-bit processors if the operation 
is different. This is usually necessary because the path to memory is a different size on 
these processors. 


See 1.8 Operation Section Notation and Functions for more information on the formal 
notation. 


This section lists the exceptions that can be caused by operation of the instruction. It omits 
exceptions that can be caused by instruction fetch, e.g. TLB Refill. It omits exceptions that 
can be caused by asynchronous external events, e.g. Interrupt. Although the Bus Error 

exception may be caused by the operation of a load or store instruction this section does not 
list Bus Error for load and store instructions because the relationship between load and store 
instructions and external error indications, like Bus Error, are implementation dependent. 


Reserved Instruction is listed for every instruction not in MIPS I because the instruction 
will cause this exception on a MIPS I processor. To execute a MIPS IT, MIPS III, or MIPS 
IV instruction, the processor must both support the architecture level and have it enabled. 
The mechanism to do this is implementation specific. 


The mechanism used to signal a floating-point unit (FPU) exception is implementation 
specific. Some implementations use the exception named “Floating Point”. Others use 
external interrupts (the Interrupt exception). This section lists Floating Point to represent 
all such mechanisms. The specific FPU traps possible are listed, indented, under the 
Floating Point entry. 


An instruction may cause implementation-dependent exceptions that are not present in the 
Exceptions section. 
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1.7.9 Programming Notes, Implementation Notes 


These sections contain material that is useful for programmers and implementors 
respectively but that is not necessary to describe the instruction and does not belong in the 


description sections. 


1.8 Operation Section Notation and Functions 


In an instruction description, the Operation section describes the operation performed by 
each instruction using a high-level language notation. The contents of the Operation 
section are described here. The special symbols and functions used are documented here. 


1.8.1 Pseudocode Language 


Each of the high-level language statements is executed in sequential order (as modified by 


conditional and loop constructs). 


1.8.2 Pseudocode Symbols 


Special symbols used in the notation are described in Table 1-29. 
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Table 1-29 Symbols in Instruction Operation Statements 


Symbol Meaning 
- Assignment. 
=,F4 Tests for equality and inequality. 
| | Bit string concatenation. 
xy A y-bit string formed by y copies of the single-bit value x. 
- Selection of bits y through z of bit string x. Little-endian bit notation (rightmost bit is 0) is 
y--Z used. If y is less than z, this expression is an empty (zero length) bit string. 
BS 2’s complement or floating-point arithmetic: addition, subtraction. 
ee 2’s complement or floating-point multiplication (both used for either). 
div 2’s complement integer division. 
mod 2’s complement modulo. 
/ Floating-point division. 
< 2’s complement less than comparison. 
nor Bit-wise logical NOR. 
xor Bit-wise logical XOR. 
and Bit-wise logical AND. 
or Bit-wise logical OR. 
GPRLEN The length in bits (32 or 64), of the CPU General Purpose Registers. 
GPR[x] CPU General Purpose Register x. The content of GPR[0] is always zero. 
FPR[x] Floating-Point operand register x. 
FCC[cc] Floating-Point condition code cc. FCC[0] has the same value as COC[1]. 
FGR[x] Floating-Point (Coprocessor unit1), general register x. 
CPR[z,x] Coprocessor unit z, general register x. 
CCR[z,x] Coprocessor unit z, control register x. 
COC[z] Coprocessor unit z condition signal. 
Endian mode as configured at chip reset (0 ZLittle, 1 AE Big). Specifies the endianness of 
BigEndianMem | the memory interface (see LoadMemory and StoreMemory), and the endianness of Kernel 


and Supervisor mode execution. 


ReverseEndian 


Signal to reverse the endianness of load and store instructions. This feature is available in 
User mode only, and is effected by setting the RE bit of the Status register. Thus, 
ReverseEndian may be computed as (SRpx and User mode). 


BigEndianCPU 


The endianness for load and store instructions (0 4 Little, 1 AE Big). In User mode, this 
endianness may be switched by setting the RE bit in the Status Register. Thus, 
BigEndianCPU may be computed as (BigEndianMem XOR ReverseEndian). 


LLbit 


Bit of virtual state used to specify operation for instructions that provide atomic read- 
modify-write. It is set when a linked load occurs. It is tested and cleared by the conditional 
store. It is cleared, during other CPU operation, when a store to the location would no 
longer be atomic. In particular, it is cleared by exception return instructions. 
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Table 1-29 (cont.) Symbols in Instruction Operation Statements 


Symbol 


Meaning 


This occurs as a prefix to operation description lines and functions as a label. It indicates 
the instruction time during which the effects of the pseudocode lines appears to occur (i.e. 
when the pseudocode is “executed”). Unless otherwise indicated, all effects of the current 
instruction appear to occur during the instruction time of the current instruction. No label 
is equivalent to a time label of “I:”. Sometimes effects of an instruction appear to occur 
either earlier or later — during the instruction time of another instruction. When that 
happens, the instruction operation is written in sections labelled with the instruction time, 
relative to the current instruction I, in which the effect of that pseudocode appears to occur. 
For example, an instruction may have a result that is not available until after the next 
instruction. Such an instruction will have the portion of the instruction operation 
description that writes the result register in a section labelled “I+1:”. 


The effect of pseudocode statements for the current instruction labelled “I+1:”appears to 
occur “at the same time” as the effect of pseudocode statements labelled “I:” for the 
following instruction. Within one pseudocode sequence the effects of the statements takes 
place in order. However, between sequences of statements for different instructions that 
occur “at the same time”, there is no order defined. Programs must not depend on a 
particular order of evaluation between such sections. 


PC 


The Program Counter value. During the instruction time of an instruction this is the address 
of the instruction word. The address of the instruction that occurs during the next 
instruction time is determined by assigning a value to PC during an instruction time. If no 
value is assigned to PC during an instruction time by any pseudocode statement, it is 
automatically incremented by 4 before the next instruction time. A taken branch assigns the 
target address to PC during the instruction time of the instruction in the branch delay slot. 


PSIZE 


The SIZE, number of bits, of Physical address in an implementation. 


1.8.3 Pseudocode Functions 


There are several functions used in the pseudocode descriptions. These are used either to 
make the pseudocode more readable, to abstract implementation specific behavior, or both. 
The functions are defined in this section. 


(1) Coprocessor General Register Access Functions 


Defined coprocessors, except for CPO, have instructions to exchange words and 
doublewords between coprocessor general registers and the rest of the system. What a 
coprocessor does with a word or doubleword supplied to it and how a coprocessor supplies 
a word or doubleword is defined by the coprocessor itself. This behavior is abstracted into 


functions: 
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Table 1-30 Coprocessor General Register Access Functions 


COP_LW (z, rt, memword) 


Z The coprocessor unit number. 
rt: Coprocessor general register specifier. 
memword: A 32-bit word value supplied to the coprocessor. 


This is the action taken by coprocessor z when supplied with a word from memory during 
a load word operation. The action is coprocessor specific. The typical action would be 
to store the contents of memword in coprocessor general register rt. 


COP_LD (z, rt, memdouble) 


Z The coprocessor unit number. 
rt: Coprocessor general register specifier. 
memdouble: 64-bit doubleword value supplied to the coprocessor. 


This is the action taken by coprocessor z when supplied with a doubleword from memory 
during a load doubleword operation. The action is coprocessor specific. The typical 
action would be to store the contents of memdouble in coprocessor general register rt. 


dataword < COP_SW (z, rt) 


Z The coprocessor unit number. 
rt: Coprocessor general register specifier. 
dataword: 32-bit word value. 


This defines the action taken by coprocessor z to supply a word of data during a store 
word operation. The action is coprocessor specific. The typical action would be to 
supply the contents of the low-order word in coprocessor general register rt. 


datadouble < COP_SD (z, rt) 

Z The coprocessor unit number. 

rt: Coprocessor general register specifier. 

datadouble: 64-bit doubleword value. 
This defines the action taken by coprocessor z to supply a doubleword of data during a 
store doubleword operation. The action is coprocessor specific. The typical action 


would be to supply the contents of the doubleword in coprocessor general register rt. 


(2) Load and Store Memory Functions 


Regardless of byte ordering (big- or little-endian), the address of a halfword, word, or 
doubleword is the smallest byte address among the bytes forming the object. For big- 
endian ordering this is the most-significant byte; for a little-endian ordering this is the least- 
significant byte. 


In the operation description pseudocode for load and store operations, the functions shown 
below are used to summarize the handling of virtual addresses and accessing physical 
memory. The size of the data item to be loaded or stored is passed in the AccessLength 
field. The valid constant names and values are shown in Table 1-31. The bytes within the 
addressed unit of memory (word for 32-bit processors or doubleword for 64-bit processors) 
which are used can be determined directly from the AccessLength and the two or three low- 
order bits of the address. 
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(pAddr, CCA) «AddressTranslation (vAddr, IorD, LorS) 


pAddr: Physical Address. 

CCA: Cache Coherence Algorithm: the method used to access caches 
and memory and resolve the reference. 

vAddr: Virtual Address. 

IorD: Indicates whether access is for INSTRUCTION or DATA. 

LorS: Indicates whether access is for LOAD or STORE. 


Translate a virtual address to a physical address and a cache coherence algorithm 
describing the mechanism used to resolve the memory reference. 

Given the virtual address vAddr, and whether the reference is to Instructions or Data 
(lorD), find the corresponding physical address (pAddr) and the cache coherence 
algorithm (CCA) used to resolve the reference. If the virtual address is in one of the 
unmapped address spaces the physical address and CCA are determined directly by the 
virtual address. If the virtual address is in one of the mapped address spaces then the TLB 
is used to determine the physical address and access type; if the required translation is not 
present in the TLB or the desired access is not permitted the function fails and an 
exception is taken. 


MemElem < LoadMemory (CCA, AccessLength, pAddr, vAddr, lorD) 
MemElem: Data is returned in a fixed width with a natural alignment. The 
width is the same size as the CPU general purpose register, 32 
or 64 bits, aligned on a 32 or 64-bit boundary respectively. 
CCA: Cache Coherence Algorithm: the method used to access caches 
and memory and resolve the reference. 
AccessLength: Length, in bytes, of access. 


pAddr: Physical Address. 
vAddr: Virtual Address. 
IorD: Indicates whether access is for Instructions or Data. 


Load a value from memory. 

Uses the cache and main memory as specified in the Cache Coherence Algorithm (CCA) 
and the sort of access (JorD) to find the contents of AccessLength memory bytes starting 
at physical location pAddr. The data is returned in the fixed width naturally-aligned 
memory element (MemElem). The low-order two (or three) bits of the address and the 
AccessLength indicate which of the bytes within MemElem needs to be given to the 
processor. If the memory access type of the reference is uncached then only the 
referenced bytes are read from memory and valid within the memory element. If the 
access type is cached, and the data is not present in cache, an implementation specific size 
and alignment block of memory is read and loaded into the cache to satisfy a load 
reference. At a minimum, the block is the entire memory element. 
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StoreMemory (CCA, AccessLength, MemElem, pAddr, vAddr) 

CCA: Cache Coherence Algorithm: the method used to access caches 
and memory and resolve the reference. 

AccessLength: Length, in bytes, of access. 

MemElem: Data in the width and alignment of a memory element. The 
width is the same size as the CPU general purpose register, 4 or 
8 bytes, aligned on a 4 or 8-byte boundary. For a partial- 
memory-element store, only the bytes that will be stored must 


be valid. 
pAddr: Physical Address. 
vAddr: Virtual Address. 


Store a value to memory. 

The specified data is stored into the physical location pAddr using the memory hierarchy 
(data caches and main memory) as specified by the Cache Coherence Algorithm (CCA). 
The MemElem contains the data for an aligned, fixed-width memory element (word for 
32-bit processors, doubleword for 64-bit processors), though only the bytes that will 
actually be stored to memory need to be valid. The low-order two (or three) bits of pAddr 
and the AccessLength field indicates which of the bytes within the MemElem data should 
actually be stored; only these bytes in memory will be changed. 


Prefetch (CCA, pAddr, vAddr, DATA, hint) 


CCA: Cache Coherence Algorithm: the method used to access caches 
and memory and resolve the reference. 

pAddr: physical Address. 

vAddr: Virtual Address. 

DATA: Indicates that access is for DATA. 

hint: hint that indicates the possible use of the data. 


Prefetch data from memory. 

Prefetch is an advisory instruction for which an implementation specific action is taken. 
The action taken may increase performance but must not change the meaning of the 
program or alter architecturally-visible state. 
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Table 1-31 AccessLength Specifications for Loads/Stores 


AccessLength Name | Value Meaning 
DOUBLEWORD 7 8 bytes (64 bits) 
SEPTIBYTE 6 7 bytes (56 bits) 
SEXTIBYTE 3 6 bytes (48 bits) 
QUINTIBYTE 4 5 bytes (40 bits) 
WORD 3 4 bytes (32 bits) 
TRIPLEBYTE 2 3 bytes (24 bits) 
HALFWORD 1 2 bytes (16 bits) 
BYTE 0 1 byte (8 bits) 


(3) Access Functions for Floating-Point Registers 


The details of the relationship between CP1 general registers and floating-point operand 
registers is encapsulated in the functions included in this section. See 2.7 Valid Operands 
for FP Instructions for more information. 


This function returns the current logical width, in bits, of the CP1 general registers. All 
32-bit processors will return “32”. 64-bit processors will return “32” when in 32-bit-CP1- 
register emulation mode and “64” when in native 64-bit mode. 


The following pseudocode referring to the Statusrrp bit is valid for all existing 
MIPS 64-bit processors at the time of this writing, however this is a privileged 
processor-specific mechanism and it may be different in some future 
processor. 


SizeFGR() -- current size, in bits, of the CP1 general registers 
size <—SizeFGR() 
if 32_bit_processor then 
size <— 32 
else 
/* 64-bit processor */ 
if StatuSep = 1 then 
size <— 64 
else 
size <— 32 
endif 
endif 
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This pseudocode specifies how the unformatted contents loaded or moved-to CP1 registers 
are interpreted to form a formatted value. If an FPR contains a value in some format, rather 
than unformatted contents from a load (uninterpreted), it is valid to interpret the value in 
that format, but not to interpret it in a different format. 


ValueFPR() -- Get a formatted value from an FPR. 
value <—ValueFPR (fpr, fmt) /* get a formatted value from an FPR */ 
if SizeFGR() = 64 then 


case fmt of 
S, W: 
value <— FGR[fpr]31. 0 
D,L: 
value <— FGR[fpr] 
endcase 
elseif fprg = 0 then /* fpr is valid (even), 32-bit wide FGRs */ 
case fmt of 
S, W: 
value <— FGR[fpr] 
D,L: 
value <— FGR[fpr+1] || FGR[fpr] 
endcase 
else /* undefined for odd 32-bit FGRs */ 
UndefinedResult 
endif 
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This pseudocode specifies the way that a binary encoding representing a formatted value is 
stored into CP1 registers by a computational or move operation. This binary representation 
is visible to store or move-from instructions. Once an FPR contains a value via StoreFPR(), 
it is not valid to interpret the value with ValueFPR() in a different format. 


StoreFPR() -- store a formatted value into an FPR. 


StoreFPR(fpr, fmt, value): /* place a formatted value into an FPR */ 
if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
case fmt of 
S, W: 
FGR[fpr] <— undefined?? || value 
D,L: 
FGR[fpr] <— value 
endcase 
elseif fprg = 0 then /* fpr is valid (even), 32-bit wide FGRs */ 
case fmt of 
S, W: 


FGR[fpr+1] <— undefined?? 

FGR[fpr] <— value 

D, L: 
FGR[fpr+1] <— valueg3, 32 
FGR[fpr] <— value3; 0 


endcase 
else /* undefined for odd 32-bit FGRs */ 
UndefinedResult 
endif 
(4) Miscellaneous Functions 
SyncOperation(stype) 
stype: Type of load/store ordering to perform. 


order loads and stores to synchronize shared memory. 
Perform the action necessary to make the effects of groups synchronizable loads and 
stores indicated by stype occur in the same order for all processors. 


SignalException(Exception) 
Exception The exception condition that exists. 
Signal an exception condition. 
This will result in an exception that aborts the instruction. The instruction operation 
pseudocode will never see a return from this function call. 


UndefinedResult() 


This function indicates that the result of the operation is undefined. 
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NullifyCurrentInstructionQ 


Nullify the current instruction. 

This occurs during the instruction time for some instruction and that instruction is not 
executed further. This appears for branch-likely instructions during the execution of the 
instruction in the delay slot and it kills the instruction in the delay slot. 


CoprocessorOperation (z, cop_fun) 
Z Coprocessor unit number 
cop_fun Coprocessor function from function field of instruction 


Perform the specified Coprocessor operation. 


1.9 Individual CPU Instruction Descriptions 
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Add Word ADD 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 ADD 
000000 00000 100000 

6 5 5 5 5 6 
Format: ADD 1d, rs, rt MIPS I 
Purpose: To add 32-bit integers. If overflow occurs, then trap. 


Description: rd< rs +rt 


The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs to produce a 32-bit 
result. If the addition results in 32-bit 2’s complement arithmetic overflow then the destination 
register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 
32-bit result is placed into GPR rd. 


Restrictions: 
On 64-bit processors, if either GPR 7t or GPR rs do not contain sign-extended 32-bit values 
(bits 63..31 equal), then the result of the operation is undefined. 


Operation: 


if (NotWordValue(GPR[rs]) or NotWordValue(GPR/rt])) then UndefinedResult() endif 
temp <-GPR[rs] + GPR[rt] 
if (82_bit_arithmetic_overflow) then 
SignalException(IntegerOverflow) 
else 
GPR[rd] <-sign_extend(temp31_ 0) 
endif 


Exceptions: 
Integer Overflow 
Programming Notes: 


ADDU performs the same arithmetic operation but, does not trap on overflow. 
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AD DI Add Immediate Word 
31 26 25 21 20 16 15 0 
ADDI rs rt immediate 
001000 
6 5 5 16 
Format: ADDI rt, rs, immediate MIPS | 
Purpose: To add a constant to a 32-bit integer. If overflow occurs, then trap. 


Description: rt < rs + immediate 


The 16-bit signed immediate is added to the 32-bit value in GPR rs to produce a 32-bit result. 
If the addition results in 32-bit 2’s complement arithmetic overflow then the destination register 
is not modified and an Integer Overflow exception occurs. If it does not overflow, the 32-bit 
result is placed into GPR /t. 


Restrictions: 
On 64-bit processors, if GPR rs does not contain a sign-extended 32-bit value 
(bits 63..31 equal), then the result of the operation is undefined. 


Operation: 


if (NotWordValue(GPRIrs])) then UndefinedResult() endif 
temp <-GPR[rs] + sign_extend(immediate) 
if (82_bit_arithmetic_overflow) then 
SignalException(IntegerOverflow) 
else 
GPRI[rt] <sign_extend(temp31_ 0) 
endif 


Exceptions: 
Integer Overflow 


Programming Notes: 


ADDIU performs the same arithmetic operation but, does not trap on overflow. 
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Add Immediate Unsigned Word ADDIU 
31 26 25 21 20 16 15 0 
ADDIU rs rt immediate 
001001 
6 5 5 16 
Format: ADDIU rt, rs, immediate MIPS | 
Purpose: To add a constant to a 32-bit integer. 


Description: rt <— rs + immediate 


The 16-bit signed immediate is added to the 32-bit value in GPR rs and the 32-bit arithmetic 
result is placed into GPR rt. 


No Integer Overflow exception occurs under any circumstances. 


Restrictions: 
On 64-bit processors, if GPR rs does not contain a sign-extended 32-bit value 
(bits 63..31 equal), then the result of the operation is undefined. 
Operation: 
if (NotWordValue(GPRIrs])) then UndefinedResult() endif 
temp <GPRirs] + sign_extend(immediate) 
GPR[rt] — sign_extend(temp31_o) 
Exceptions: 


None 


Programming Notes: 


The term “unsigned” in the instruction name is a misnomer; this operation is 32-bit modulo 
arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, 
such as address arithmetic, or integer arithmetic environments that ignore overflow, such as “C” 
language arithmetic. 
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ADDU Add Unsigned Word 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 ADDU 
000000 00000 / 100001 
6 5 5 5 5 6 
Format: ADDU 1d, rs, rt MIPS | 


Purpose: To add 32-bit integers. 


Description: rd< rs + rt 


The 32-bit word value in GPR rtis added to the 32-bit value in GPR rs and the 32-bit arithmetic 
result is placed into GPR rd. 


No Integer Overflow exception occurs under any circumstances. 


Restrictions: 
On 64-bit processors, if either GPR 7t or GPR rs do not contain sign-extended 32-bit values 
(bits 63..31 equal), then the result of the operation is undefined. 

Operation: 
if (NotWordValue(GPR[rs]) or NotWordValue(GPRirt])) then UndefinedResult() endif 
temp <GPR[rs] + GPR[rt] 
GPR[rd]<~ sign_extend(temp3,_ 0) 

Exceptions: 


None 


Programming Notes: 


The term “unsigned” in the instruction name is a misnomer; this operation is 32-bit modulo 
arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, 
such as address arithmetic, or integer arithmetic environments that ignore overflow, such as “C” 
language arithmetic. 
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And AND 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 AND 
000000 00000 100100 
6 5 5 5 5 6 
Format: AND td, rs, rt MIPS | 
Purpose: To do a bitwise logical AND. 


Description: rd <rs AND rt 


The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical AND 
operation. The result is placed into GPR rd. 


Restrictions: 


None 


Operation: 
GPR[rd] < GPR[rs] and GPR[rt] 


Exceptions: 


None 
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And Immediate 


31 26 25 21 20 16 15 0 
ANDI rs rt immediate 
001100 
6 5 5 16 
Format: ANDI tt, rs, immediate MIPS | 
Purpose: To do a bitwise logical AND with a constant. 


Description: rt —rs AND immediate 


The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in 
a bitwise logical AND operation. The result is placed into GPR rt. 


Restrictions: 


None 


Operation: 
GPRi[rt] < zero_extend(immediate) and GPRirs] 


Exceptions: 


None 
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Branch on Equal B EQ 


31 26 25 21 20 16 15 0 
BEQ rs rt offset 
000100 
6 5 5 16 
Format: BEQ ts, rt, offset MIPS | 
Purpose: To compare GPRs then do a PC-relative conditional branch. 


Description: if (rs = rt) then branch 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs and GPR rt are equal, branch to the effective target address after the 
instruction in the delay slot is executed. 


Restrictions: 
None 


Operation: 


I: tgt_offset — sign_extend(offset || 0°) 
condition — (GPR[rs] = GPR[rt]) 
1+1: if condition then 
PC + PC + tgt_offset 
endif 


Exceptions: 
None 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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BEQL Branch on Equal Likely 


31 26 25 21 20 16 15 0 
BEQL rs rt offset 
010100 
6 5 5 16 
Format: BEQL rs, rt, offset MIPS Il 
Purpose: To compare GPRs then do a PC-relative conditional branch; execute the delay slot 


only if the branch is taken. 


Description: if (rs = rt) then branch_likely 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs and GPR rt are equal, branch to the target address after the instruction 
in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not 
executed. 


Restrictions: 


None 


Operation: 
I: tgt_offset — sign_extend(offset || 0°) 
condition — (GPR[rs] = GPR[rt]) 
1+1: if condition then 
PC < PC + tgt_offset 
else 
NullifyCurrentinstruction() 
endif 


Exceptions: 


Reserved Instruction 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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Branch on Greater Than or Equal to Zero BG EZ 
31 26 25 21 20 16 15 0 
REGIMM rs BGEZ offset 
000001 00001 
6 5 5 16 
Format: BGEZ 1s, offset MIPS | 
Purpose: To test a GPR then do a PC-relative conditional branch. 


Description: if (rs > 0) then branch 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective 
target address after the instruction in the delay slot is executed. 


Restrictions: 
None 


Operation: 


I: tgt_offset — sign_extend(offset || 0°) 
condition — GPR[rs] > OGPRLEN 
1+1: if condition then 
PC «+ PC + tgt_offset 
endif 


Exceptions: 
None 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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BG EZAL Branch on Greater Than or Equal to Zero and Link 
31 26 25 21 20 16 15 0 
REGIMM rs BGEZAL offset 
000001 10001 
6 5 5 16 
Format: BGEZAL 1s, offset MIPS | 
Purpose: To test a GPR then do a PC-relative conditional procedure call. 


Description: if (rs > 0) then procedure_call 


Place the return address link in GPR 31. The return link is the address of the second instruction 
following the branch, where execution would continue after a procedure call. 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective 
target address after the instruction in the delay slot is executed. 


Restrictions: 


GPR 31 must not be used for the source register rs, because such an instruction does not have 
the same effect when re-executed. The result of executing such an instruction is undefined. 
This restriction permits an exception handler to resume execution by re-executing the branch 
when an exception occurs in the branch delay slot. 


Operation: 

I: tgt_offset — sign_extend(offset || 02) 
condition — GPR[rs] > OGPRLEN 
GPR[31] <— PC +8 

1+1: if condition then 

PC «+ PC + tgt_offset 
endif 


Exceptions: 
None 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to more 
distant addresses. 
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Branch on Greater Than or Equal to Zero and Link Likely BG EZALL 


31 26 25 21 20 16 15 0 
REGIMM rs BGEZALL offset 
000001 10011 
6 5 5 16 
Format: BGEZALL 1s, offset MIPS Il 
Purpose: To test a GPR then do a PC-relative conditional procedure call; execute the delay 


slot only if the branch is taken. 


Description: if (rs > 0) then procedure_call_likely 


Place the return address link in GPR 31. The return link is the address of the second instruction 
following the branch, where execution would continue after a procedure call. 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective 
target address after the instruction in the delay slot is executed. If the branch is not taken, the 
instruction in the delay slot is not executed. 


Restrictions: 


GPR 31 must not be used for the source register rs, because such an instruction does not have 
the same effect when re-executed. The result of executing such an instruction is undefined. 
This restriction permits an exception handler to resume execution by re-executing the branch 
when an exception occurs in the branch delay slot. 


Operation: 

I: tgt_offset — sign_extend(offset || 02) 
condition — GPR[rs] > OGPRLEN 
GPR[31] <— PC +8 

1+1: if condition then 

PC < PC + tgt_offset 
else 

NullifyCurrentinstruction() 
endif 


Exceptions: 
Reserved Instruction 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to more 
distant addresses. 
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BG EZL Branch on Greater Than or Equal to Zero Likely 


31 26 25 21 20 16 15 0 


REGIMM rs BGEZL offset 
000001 00011 


6 5 5 16 


Format: BGEZL rs, offset MIPS Il 


Purpose: To test a GPR then do a PC-relative conditional branch; execute the delay slot only 
if the branch is taken. 


Description: if (rs > 0) then branch_likely 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective 
target address after the instruction in the delay slot is executed. If the branch is not taken, the 
instruction in the delay slot is not executed. 


Restrictions: 


None 


Operation: 
I: tgt_offset < sign_extend(offset || 0°) 
condition — GPR[rs] > OGPRLEN 
1+1: if condition then 
PC «+ PC + tgt_offset 
else 
NullifyCurrentinstruction() 
endif 


Exceptions: 
Reserved Instruction 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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Branch on Greater Than Zero BGTZ 
31 26 25 21 20 16 15 0 
BGTZ rs 0 offset 
000111 00000 
6 5 5 16 
Format: BGTZ._ rs, offset MIPS | 


Purpose: To test a GPR then do a PC-relative conditional branch. 


Description: if (rs > 0) then branch 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs are greater than zero (sign bit is 0 but value not zero), branch to the 
effective target address after the instruction in the delay slot is executed. 


Restrictions: 
None 


Operation: 


I:  tgt_offset — sign_extend(offset || 0°) 
condition — GPR[rs] > OGPRLEN 
I+1: if condition then 
PC «+ PC + tgt_offset 
endif 


Exceptions: 
None 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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BGTZL Branch on Greater Than Zero Likely 


31 26 25 21 20 16 15 0 


BGTZL rs 0 offset 
010111 00000 


6 5 5 16 


Format: BGTZL 1s, offset MIPS Il 


Purpose: To test a GPR then do a PC-relative conditional branch; execute the delay slot only 
if the branch is taken. 


Description: if (rs > 0) then branch_likely 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs are greater than zero (sign bit is 0 but value not zero), branch to the 
effective target address after the instruction in the delay slot is executed. If the branch is not 
taken, the instruction in the delay slot is not executed. 


Restrictions: 


None 


Operation: 
I: tgt_offset < sign_extend(offset || 0°) 
condition — GPR[rs] > OGPRLEN 
1+1: if condition then 
PC «+ PC + tgt_offset 
else 
NullifyCurrentinstruction() 
endif 


Exceptions: 
Reserved Instruction 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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Branch on Less Than or Equal to Zero B LEZ 
31 26 25 21 20 16 15 0 
BLEZ rs 0 offset 
000110 00000 
6 5 5 16 
Format: BLEZ 1s, offset MIPS | 
Purpose: To test a GPR then do a PC-relative conditional branch. 


Description: if (rs < 0) then branch 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs are less than or equal to zero (sign bit is 1 or value is zero), branch to 
the effective target address after the instruction in the delay slot is executed. 


Restrictions: 
None 


Operation: 


I: tgt_offset < sign_extend(offset || 0°) 
condition — GPR[rs] < OGPRLEN 
1+1: if condition then 
PC < PC + tgt_offset 


endif 
Exceptions: 
None 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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BLEZL Branch on Less Than or Equal to Zero Likely 


31 26 25 21 20 16 15 0 


BLEZL rs 0 offset 
010110 00000 


6 5 5 16 


Format: BLEZL rs, offset MIPS Il 


Purpose: To test a GPR then do a PC-relative conditional branch; execute the delay slot only 
if the branch is taken. 


Description: if (rs < 0) then branch_likely 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs are less than or equal to zero (sign bit is 1 or value is zero), branch to 
the effective target address after the instruction in the delay slot is executed. If the branch is not 
taken, the instruction in the delay slot is not executed. 


Restrictions: 


None 


Operation: 
I: tgt_offset — sign_extend(offset || 0°) 
condition — GPR[rs] < OGPRLEN 
1+1: if condition then 
PC < PC + tgt_offset 
else 
NullifyCurrentinstruction() 
endif 


Exceptions: 
Reserved Instruction 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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Branch on Less Than Zero BLTZ 
31 26 25 21 20 16 15 0 
REGIMM rs BLTZ offset 
000001 00000 
6 5 5 16 
Format: BLTZ 1s, offset MIPS | 
Purpose: To test a GPR then do a PC-relative conditional branch. 


Description: if (rs < 0) then branch 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address 
after the instruction in the delay slot is executed. 


Restrictions: 
None 


Operation: 


I: tgt_offset — sign_extend(offset || 0°) 
condition — GPR[rs] < OGPRLEN 
1+1: if condition then 
PC «+ PC + tgt_offset 
endif 


Exceptions: 
None 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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BLTZAL Branch on Less Than Zero And Link 


31 26 25 21 20 16 15 0 
REGIMM rs BLTZAL offset 
000001 10000 
6 5 5 16 
Format: BLTZAL 1, offset MIPS | 
Purpose: To test a GPR then do a PC-relative conditional procedure call. 


Description: if (rs < 0) then procedure_call 


Place the return address link in GPR 31. The return link is the address of the second instruction 
following the branch (not the branch itself), where execution would continue after a procedure 
call. 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch, in the branch delay slot, to form a PC-relative effective target 
address. 


If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address 
after the instruction in the delay slot is executed. 


Restrictions: 


GPR 31 must not be used for the source register rs, because such an instruction does not have 
the same effect when re-executed. The result of executing such an instruction is undefined. 
This restriction permits an exception handler to resume execution by re-executing the branch 
when an exception occurs in the branch delay slot. 


Operation: 

I. tgt_offset — sign_extend(offset || 02) 
condition — GPR[rs] < OGPRLEN 
GPR[31] <— PC +8 

1+1: if condition then 

PC «+ PC + tgt_offset 
endif 


Exceptions: 
None 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to more 
distant addresses. 
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Branch on Less Than Zero And Link Likely B LTZALL 
31 26 25 21 20 16 15 0 
REGIMM rs BLTZALL offset 
000001 10010 
6 5 5 16 
Format: BLTZALL 1s, offset MIPS Il 
Purpose: To test a GPR then do a PC-relative conditional procedure call; execute the delay 


slot only if the branch is taken. 


Description: if (rs < 0) then procedure_call_likely 


Place the return address link in GPR 31. The return link is the address of the second instruction 
following the branch (not the branch itself), where execution would continue after a procedure 
call. 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch, in the branch delay slot, to form a PC-relative effective target 
address. 


If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address 
after the instruction in the delay slot is executed. If the branch is not taken, the instruction in 
the delay slot is not executed. 


Restrictions: 
GPR 31 must not be used for the source register rs, because such an instruction does not have 
the same effect when re-executed. The result of executing such an instruction is undefined. 
This restriction permits an exception handler to resume execution by re-executing the branch 
when an exception occurs in the branch delay slot. 


Operation: 

I: tgt_offset — sign_extend(offset || 02) 
condition — GPR[rs] < OGPRLEN 
GPR[31] <— PC +8 

1+1: if condition then 

PC < PC + tgt_offset 
else 

NullifyCurrentinstruction() 
endif 


Exceptions: 
Reserved Instruction 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to more 
distant addresses. 
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BLTZL Branch on Less Than Zero Likely 
31 26 25 21 20 16 15 0 
REGIMM rs BLTZL offset 
000001 00010 
6 5 5 16 
Format: BLTZ 1s, offset MIPS II 
Purpose: To test a GPR then do a PC-relative conditional branch; execute the delay slot only 


if the branch is taken. 


Description: if (rs < 0) then branch_likely 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address 
after the instruction in the delay slot is executed. If the branch is not taken, the instruction in 
the delay slot is not executed. 


Restrictions: 


None 


Operation: 
I: tgt_offset < sign_extend(offset || 0°) 
condition — GPR[rs] < OGPRLEN 
1+1: if condition then 
PC «+ PC + tgt_offset 
else 
NullifyCurrentinstruction() 
endif 


Exceptions: 
Reserved Instruction 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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Branch on Not Equal BN E 
31 26 25 21 20 16 15 0 
BNE rs rt offset 
000101 
6 5 5 16 
Format: BNE sss, rt, offset MIPS | 
Purpose: To compare GPRs then do a PC-relative conditional branch. 


Description: if (rs # rt) then branch 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs and GPR rt are not equal, branch to the effective target address after 
the instruction in the delay slot is executed. 


Restrictions: 
None 


Operation: 


I: tgt_offset — sign_extend(offset || 0°) 
condition — (GPR[rs]  GPR{[rt]) 
1+1: if condition then 
PC «+ PC + tgt_offset 
endif 


Exceptions: 
None 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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BNEL Branch on Not Equal Likely 
31 26 25 21 20 16 15 0 
BNEL rs rt offset 
010101 
6 5 5 16 
Format: BNEL rs, rt, offset MIPS Il 
Purpose: To compare GPRs then do a PC-relative conditional branch; execute the delay slot 


only if the branch is taken. 


Description: if (rs + rt) then branch_likely 


An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the contents of GPR rs and GPR rt are not equal, branch to the effective target address after 
the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay 
slot is not executed. 


Restrictions: 


None 


Operation: 
I: tgt_offset — sign_extend(offset || 0°) 
condition <— (GPR[rs] # GPRirt]) 
1+1: if condition then 
PC < PC + tgt_offset 
else 
NullifyCurrentinstruction() 
endif 


Exceptions: 
Reserved Instruction 


Programming Notes: 


With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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Breakpoint BREAK 


31 26 25 65 0 
SPECIAL code BREAK 
000000 001101 
6 20 6 
Format: BREAK MIPS | 
Purpose: To cause a Breakpoint exception. 
Description: 


A breakpoint exception occurs, immediately and unconditionally transferring control to the 
exception handler. 


The code field is available for use as software parameters, but is retrieved by the exception 
handler only by loading the contents of the memory word containing the instruction. 


Restrictions: 


None 


Operation: 
SignalException(Breakpoint) 
Exceptions: 


Breakpoint 
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Cache 
31 26 25 21 20 16 15 0 
CACHE base oP. offset 
101111 (see Table below) 
6 5 5 16 
Format: CACHE op, offset(base) MIPS Ill 


Description: 


The 16 bit offset is sign-extended and added to the contents of general register base to form a 
CacheOp virtual address (VA). The VA is translated to a physical address (PA) through the 
TLB, and the 5-bit opcode (decoded in Table 1-32) specifies a cache operation for that address, 
together with the affected cache. Operation of this instruction on any combination not listed in 
the tables below is undefined. The operation of this instruction on uncached addresses is also 


undefined. 


Table 1-32, CACHE Instruction Op Field Encoding 


Op Field CACHE Instruction Variation Target Cache 
R5000 R10000 
00000 Index Invalidate I 
00100 Index Load Tag I 
01000 Index Store Tag I 
10000 Hit Invalidate I 
10100 Fill Cache Barrier I (Fill) 
11000 Hit Writeback Index Load Data I 
11100 - Index Store Data I 
00001 Index Writeback Invalidate D 
00101 Index Load Tag D 
01001 Index Store Tag D 
01101 Create Dirty Exclusive - D 
10001 Hit Invalidate D 
10101 Hit Writeback Invalidate D 
11001 Hit Writeback Index Load Data D 
11101 - Index Store Data D 
00011 Flash Index Writeback Invalidate S 
00111 Index Load Tag S 
01011 Index Store Tag S 
10011 ~ Hit Invalidate S 
10111 Page Invalidate Hit Writeback Invalidate S 
11011 - Index Load Data S 
11111 - Index Store Data S 
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Cache CAC H E 


Fill, Create Dirty, Hit WriteBack and Hit Set Virtual are not supported in the R5000 and the 
R10000 processors. 


The R5000 and the R10000 processors add two new CacheOps: Index Load Data (110) and 
Index Store Data (111,). These changes are also reflected in the CPO TagHi, TagLo and ECC 
registers. 


Both of the primary instruction and data caches of the R5000 have a block size of 32 bytes (8 
data words). 


The primary instruction and data caches of the R10000 have a block size of 16 words and 32 
bytes (8 data words), respectively. 


NOTE: A 32-bit instruction is predecoded into a 36-bit instruction word before 
entering the primary instruction cache. The instruction fetch addresses remain the 
same and are not affected by the predecode. 


The secondary cache, a unified cache, has a block size of 32 bytes (R5000) or either 64 or 128 
bytes (R10000), configurated during reset. 


For a cache of 2-4CHESIZE bytes with QBLOCKSe bytes per tag, 
VAj3,5 (R5000) 
VACACHESIZE-2..BLOCKSIZE (R10000) 

specifies the block for the primary cache, and 
PACACHESIZE..BLOCKSIZE (R5000) 
PACACHESIZE-2..BLOCKSIZE (R10000) 


specifies the block for the secondary cache. 


For the Index CacheOps of the R5000, virtual address bit 14 is used to specify the way, 0 or 1, 
for the CacheOp. 


For the Index CacheOps of the R10000, address bit 0 is used to specify the way, 0 or 1, for the 
CacheOp. For this reason, bit 0 is not checked for alignment-type Address Error exception for 
the Index CacheOps. 


For CacheOps that access data in caches, 


VABLOCKSIZE-1..2 (R5000) 
VABLOCKSIZE-1..2 (R10000) 


specifies a word within a block for primary caches, and 


PABLOCKSIZE-1..2 (R5000) 
PABLOCKSIZE-1..3 (R10000) 


specifies a doubleword in the secondary cache. 


A cache hit accesses the specified cache as normal data references, and performs the specified 
operation if the cache block contains valid data at the specified physical address. If the cache 
line is invalid or contains a differing physical address (a cache miss), no operation is performed. 
Since the R5000 and the R10000 processors use 2-way set associative caches, the Hit operation 
performs tag comparison in both ways of the cache. No index needs to be provided for such 
CacheOps. If both ways register a hit, the execution of the CacheOp is undefined. 
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CAC H E Cache 


Write back from the primary data cache goes to the secondary cache, and write back from the 
secondary cache goes to the system interface. The primary data cache is written back to the 
secondary cache before the secondary cache is written back to the system interface; the address 
to be written is based by the cache tag, rather than the translated PA from the CacheOp 
instruction. A secondary cache write back also interrogates the primary data cache for any dirty 
inconsistent data. 


When a line is invalidated in the secondary cache, all subset lines in the primary caches are also 
invalidated. 


CacheOps are serialized with respect to cached loads/stores and CPO instructions. Therefore, in 
general, there are no hazards for CacheOps. However, if the CacheOps modify the current 
instruction fetching stream, they may not work properly since the instruction fetch pipeline 
usually prefetches and buffers instructions and CacheOps are not serialized with respect to the 
instruction fetch pipeline. Programmers should be aware of such potential hazards; one solution 
is to put a COPO instruction after the CacheOp to prevent the speculative execution and force 
the CacheOp to complete, and then use a Jump Register instruction to flush the instruction fetch 
pipeline. Succeeding instructions will then be re-fetched from caches. 


If CPO is not usable, a Coprocessor Unusable exception is taken. CacheOps may induce 
Address Error or TLBL exceptions (Refill or Invalid) during address translation, but never take 
a TLBS or Mod exception. The virtual address is used to index the cache for an Index 
CacheOp, but need not match the cache physical tag; unmapped addresses may be used to avoid 
TLB exceptions. 


The R5000 and the R10000 processors do not support the CE bit, and programmers must supply 
correct parity bits or ECC for some CacheOps. 


The R5000 and the R10000 processors support the CH bit for secondary CacheOps, Hit 
Invalidate, and Hit WriteBack Invalidate. As in the R4400, a hit sets the CH bit of the Status 
register, and a miss resets it. This bit is readable and writable by software. 


Operation: 


vAddr < ((offset,s)*® || offsetys 9) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
CacheOp (op, vAddr, pAddr) 


Exceptions: 


Coprocessor unusable 
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Coprocessor Operation CO Pz 
31 26 25 0 
COPz cop_fun 
0100zz 
6 26 
Format: COPO cop_fun MIPS | 


COP1 cop_fun 
COP2 cop _fun 
COP3 cop_fun 


Purpose: To execute a coprocessor instruction. 


Description: 


The coprocessor operation specified by cop_fun is performed by coprocessor unit zz. Details of 
coprocessor operations must be found in the specification for each coprocessor. 


Each MIPS architecture level defines up to 4 coprocessor units, numbered 0 to 3 (see 
1.2.5 Coprocessor Instructions). The opcodes corresponding to coprocessors that are not 
defined by an architecture level may be used for other instructions. 


Restrictions: 


Access to the coprocessors is controlled by system software. Each coprocessor has a 
“coprocessor usable” bit in the System Control coprocessor. The usable bit must be set for a 
user program to execute a coprocessor instruction. If the usable bit is not set, an attempt to 
execute the instruction will result in a Coprocessor Unusable exception. An unimplemented 
coprocessor must never be enabled. The result of executing this instruction for an 
unimplemented coprocessor when the usable bit is set, is undefined. 


See specification for the specific coprocessor being programmed. 
Operation: 

CoprocessorOperation (z, cop_fun) 
Exceptions: 

Reserved Instruction 

Coprocessor Unusable 


Coprocessor interrupt or Floating-Point Exception (CP1 only for some processors) 
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DADD Doubleword Add 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DADD 
000000 00000 101100 

6 5 5 5 5 6 
Format: DADD rd, rs, rt MIPS III 
Purpose: To add 64-bit integers. If overflow occurs, then trap. 


Description: rd< rs + rt 


The 64-bit doubleword value in GPR rt is added to the 64-bit value in GPR rs to produce a 
64-bit result. If the addition results in 64-bit 2’s complement arithmetic overflow then the 
destination register is not modified and an Integer Overflow exception occurs. If it does not 
overflow, the 64-bit result is placed into GPR rd. 

Restrictions: 


None 


Operation: 64-bit processors 


temp < GPRirs] + GPR[rt] 
if (64_bit_arithmetic_overflow) then 
SignalException(IntegerOverflow) 
else 
GPRi{rd] < temp 
endif 


Exceptions: 
Integer Overflow 
Reserved Instruction 


Programming Notes: 


DADDU performs the same arithmetic operation but, does not trap on overflow. 
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Doubleword Add Immediate DAD DI 
31 26 25 21 20 16 15 0 
DADDI rs rt immediate 
011000 
6 5 5 16 
Format: DADDI ft, rs, immediate MIPS Ill 
Purpose: To add a constant to a 64-bit integer. If overflow occurs, then trap. 


Description: rt <— rs + immediate 


The 16-bit signed immediate is added to the 64-bit value in GPR rs to produce a 64-bit result. 
If the addition results in 64-bit 2’s complement arithmetic overflow then the destination register 
is not modified and an Integer Overflow exception occurs. If it does not overflow, the 64-bit 
result is placed into GPR rt. 

Restrictions: 


None 


Operation: 64-bit processors 


temp < GPRI[rs] + sign_extend(immediate) 
if (64_bit_arithmetic_overflow) then 
SignalException(IntegerOverflow) 
else 
GPRirt] < temp 
endif 


Exceptions: 
Integer Overflow 
Reserved Instruction 
Programming Notes: 


DADDIU performs the same arithmetic operation but, does not trap on overflow. 
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DADDIU Doubleword Add Immediate Unsigned 
31 26 25 21 20 16 15 0 
DADDIU rs rt immediate 
011001 
6 5 5 16 
Format: DADDIU rt, rs, immediate MIPS III 
Purpose: To add a constant to a 64-bit integer. 


Description: rt < rs + immediate 


The 16-bit signed immediate is added to the 64-bit value in GPR rs and the 64-bit arithmetic 
result is placed into GPR /t. 


No Integer Overflow exception occurs under any circumstances. 
Restrictions: 

None 
Operation: 64-bit processors 

GPR[rt] — GPR[rs] + sign_extend(immediate) 
Exceptions: 

Reserved Instruction 
Programming Notes: 


The term “unsigned” in the instruction name is a misnomer; this operation is 64-bit modulo 
arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, 
such as address arithmetic, or integer arithmetic environments that ignore overflow, such as “C” 
language arithmetic. 


Doubleword Add Unsigned DADDU 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DADDU 
000000 00000} 101101 
6 5 5 5 5 6 
Format: DADDU 1d, rs, rt MIPS Ill 


Purpose: To add 64-bit integers. 


Description: rd< rs + rt 


The 64-bit doubleword value in GPR rt is added to the 64-bit value in GPR rs and the 64-bit 
arithmetic result is placed into GPR rd. 


No Integer Overflow exception occurs under any circumstances. 
Restrictions: 
None 
Operation: 64-bit processors 
GPRi{rd] <GPR{[rs] + GPR[rt] 
Exceptions: 
Reserved Instruction 
Programming Notes: 


The term “unsigned” in the instruction name is a misnomer; this operation is 64-bit modulo 
arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, 
such as address arithmetic, or integer arithmetic environments that ignore overflow, such as “C” 
language arithmetic. 
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DDIV Doubleword Divide 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 DDIV 
000000 00 0000 0000 011110 
6 5 5 10 6 
Format: DDIV fs, rt MIPS Ill 


Purpose: To divide 64-bit signed integers. 


Description: (LO, Hl) < rs/rt 


The 64-bit doubleword in GPR rs is divided by the 64-bit doubleword in GPR rt, treating both 
operands as signed values. The 64-bit quotient is placed into special register LO and the 64-bit 
remainder is placed into special register HI. 


No arithmetic exception occurs under any circumstances. 


Restrictions: 


If either of the two preceding instructions is MFHI or MFLO, the result of the MFHI or MFLO 
is undefined. Reads of the H/7 or LO special registers must be separated from subsequent 
instructions that write to them by two or more other instructions. 


If the divisor in GPR rt is zero, the arithmetic result value is undefined. 


Operation: 64-bit processors 


I-2:, I-1: LO, HI « undefined 
I: LO < GPRIrs] div GPR[rt] 
HI < GPRirs] mod GPRirt] 


Exceptions: 
Reserved Instruction 


Programming Notes: 


See the Programming Notes for the DIV instruction. 


Doubleword Divide Unsigned DDIVU 
31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 DDIVU 
000000 000000 0000 011111 
6 5 5 10 6 

Format: DDIVU ts, rt MIPS III 


Purpose: To divide 64-bit unsigned integers. 


Description: (LO, Hl) < rs/rt 


The 64-bit doubleword in GPR rs is divided by the 64-bit doubleword in GPR rt, treating both 
operands as unsigned values. The 64-bit quotient is placed into special register LO and the 
64-bit remainder is placed into special register HI. 


No arithmetic exception occurs under any circumstances. 


Restrictions: 


If either of the two preceding instructions is MFHI or MFLO, the result of the MFHI or MFLO 
is undefined. Reads of the H/7 or LO special registers must be separated from subsequent 
instructions that write to them by two or more other instructions. 


If the divisor in GPR rt is zero, the arithmetic result value is undefined. 


Operation: 64-bit processors 


-2:,-1: | LO,HI < undefined 
I: LO — < (0 || GPR{rs]) div (0 || GPRIrt]) 
HI € (0 || GPR[rs]) mod (0 || GPR{[rt]) 


Exceptions: 
Reserved instruction 


Programming Notes: 


See the Programming Notes for the DIV instruction. 
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DIV Divide Word 


31 26 25 21 20 16 15 6 5 0 
SPECIAL ze “ 0 DIV 
000000 00 0000 0000 011010 

6 5 5 10 6 
Format: DIV rs, rt MIPS | 


Purpose: To divide 32-bit signed integers. 


Description: (LO, Hl) < rs/rt 


The 32-bit word value in GPR rs is divided by the 32-bit value in GPR rt, treating both operands 
as signed values. The 32-bit quotient is placed into special register LO and the 32-bit remainder 
is placed into special register HI. 


No arithmetic exception occurs under any circumstances. 


Restrictions: 


On 64-bit processors, if either GPR 7t or GPR rs do not contain sign-extended 32-bit values 
(bits 63..31 equal), then the result of the operation is undefined. 


If either of the two preceding instructions is MFHI or MFLO, the result of the MFHI or MFLO 
is undefined. Reads of the H/ or LO special registers must be separated from subsequent 
instructions that write to them by two or more other instructions. 


If the divisor in GPR rt is zero, the arithmetic result value is undefined. 


Operation: 


if (NotWordValue(GPR[rs]) or NotWordValue(GPRirt])) then UndefinedResult() endif 
I-2:, I-1: LO, HI « undefined 


I: q < GPRIrs]31_ 9 div GPR[rt]31 6 
LO < sign_extend(q31_ 0) 
r < GPRIrs]31..9 mod GPRIrt]31_ 
HI < sign_extend(rs1_ 9) 
Exceptions: 
None 


Programming Notes: 


In some processors the integer divide operation may proceed asynchronously and allow other 
CPU instructions to execute before it is complete. An attempt to read LO or H/ before the results 
are written will wait (interlock) until the results are ready. Asynchronous execution does not 

affect the program result, but offers an opportunity for performance improvement by scheduling 
the divide so that other instructions can execute in parallel. 


Chapter 1 CPU Instruction Set 


Divide Word DIV 


No arithmetic exception occurs under any circumstances. If divide-by-zero or overflow 
conditions should be detected and some action taken, then the divide instruction is typically 
followed by additional instructions to check for a zero divisor and/or for overflow. If the divide 
is asynchronous then the zero-divisor check can execute in parallel with the divide. The action 
taken on either divide-by-zero or overflow is either a convention within the program itself or 
more typically, the system software; one possibility is to take a BREAK exception with a code 
field value to signal the problem to the system software. 


As an example, the C programming language in a UNIX™ environment expects division by 
zero to either terminate the program or execute a program-specified signal handler. C does not 
expect overflow to cause any exceptional condition. If the C compiler uses a divide instruction, 
it also emits code to test for a zero divisor and execute a BREAK instruction to inform the 
operating system if one is detected. 
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DIVU Divide Unsigned Word 
31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 DIVU 
000000 000000 0000 011011 
6 5 5 10 6 

Format: DIVU ts, rt MIPS | 


Purpose: To divide 32-bit unsigned integers. 


Description: (LO, HI) < rs/rt 


The 32-bit word value in GPR rs is divided by the 32-bit value in GPR rt, treating both operands 
as unsigned values. The 32-bit quotient is placed into special register LO and the 32-bit 
remainder is placed into special register HI. 


No arithmetic exception occurs under any circumstances. 


Restrictions: 


On 64-bit processors, if either GPR 7t or GPR rs do not contain sign-extended 32-bit values 
(bits 63..31 equal), then the result of the operation is undefined. 


If either of the two preceding instructions is MFHI or MFLO, the result of the MFHI or MFLO 
is undefined. Reads of the H/ or LO special registers must be separated from subsequent 
instructions that write to them, like this one, by two or more other instructions. 


If the divisor in GPR rt is zero, the arithmetic result is undefined. 


Operation: 


if (NotWordValue(GPR[rs]) or NotWordValue(GPRirt])) then UndefinedResult() endif 
I-2:, I-1: LO, HI « undefined 


I: q < (0 || GPR[rs]31_.9) div (0 || GPR[rt]31._0) 
LO < sign_extend(q3 1.0) 
r <— (0 || GPR[rs]s1_.0) mod (0 || GPR[rt]31_0) 
HI < sign_extend(r31, 9) 
Exceptions: 
None 


Programming Notes: 


See the Programming Notes for the DIV instruction. 
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Doubleword Move From System Control Coprocessor DM FCO 
31 26 25 21 20 16 15 11 10 0 
COPO DMF rt rd 0 
010000 | 00001 000 0000 0000 
6 5 5 5 11 
Format: DMFCO rt, rd MIPS III 
Description: 


The contents of coprocessor register rd of the CPO are loaded into general register rt. 


This operation is defined for the R5000 and the R10000 operating in 64-bit mode and in 32-bit 
kernel mode. Execution of this instruction in 32-bit user or supervisor mode causes a reserved 
instruction exception. All 64-bits of the general register destination are written from the 
coprocessor register source. The operation of DMFCO on a 32-bit coprocessor 0 register is 
undefined. 


Operation: 64-bit processors 
data + CPR[O,rd] 
GPRi[rt] < data 


Exceptions: 
Coprocessor unusable 


Reserved instruction (In 32-bit user mode 
In 32-bit supervisor mode) 
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DMTCO Doubleword Move To System Control Coprocessor 
31 26 25 21 20 16 15 11 10 0 
COPO DMT rt rd 0 
010000 00101 000 0000 0000 
6 5 5 5 11 
Format: DMTCO rt, rd MIPS Ill 
Description: 


The contents of general register rt are loaded into coprocessor register rd of the CPO. 


This operation is defined for the R5000 and the R10000 operating in 64-bit mode or in 32-bit 
kernel mode. Execution of this instruction in 32-bit user or supervisor mode causes a reserved 
instruction exception. 


All 64-bits of the coprocessor 0 register are written from the general register source. The 
operation of DMTCO on a 32-bit coprocessor 0 register is undefined. 


Because the state of the virtual address translation system may be altered by this instruction, the 
operation of load instructions, store instructions, and TLB operations immediately prior to and 
after this instruction are undefined. 


Operation: 64-bit processors 
data + GPR[ri] 
CPR[0,rd] < data 
Exceptions: 


Coprocessor unusable (In 32-bit user mode 
In 32-bit supervisor mode) 
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Doubleword Multiply DM U LT 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 DMULT 
000000 0000000000 011100 

6 5 5 10 6 
Format: DMULT rs, rt MIPS III 


Purpose: To multiply 64-bit signed integers. 


Description: (LO, HI) < rs x rt 


The 64-bit doubleword value in GPR rt is multiplied by the 64-bit value in GPR rs, treating both 
operands as signed values, to produce a 128-bit result. The low-order 64-bit doubleword of the 
result is placed into special register LO, and the high-order 64-bit doubleword is placed into 
special register HT. 


No arithmetic exception occurs under any circumstances. 


Restrictions: 


If either of the two preceding instructions is MFHI or MFLO, the result of the MFHI or MFLO 
is undefined. Reads of the H/7 or LO special registers must be separated from subsequent 
instructions that write to them by two or more other instructions. 


Operation: 64-bit processors 
I-2:, l-1: LO, HI < undefined 
I: prod < GPRirs] * GPR[rt] 
LO < prodg3 0 
HI <— prod 197.64 


Exceptions: 


Reserved Instruction 


Programming Notes: 


In some processors the integer multiply operation may proceed asynchronously and allow other 
CPU instructions to execute before it is complete. An attempt to read LO or H/ before the results 
are written will wait (interlock) until the results are ready. Asynchronous execution does not 
affect the program result, but offers an opportunity for performance improvement by scheduling 
the multiply so that other instructions can execute in parallel. 


Programs that require overflow detection must check for it explicitly. 


85 


56 


Chapter 1 CPU Instruction Set 


DM U LTU Doubleword Multiply Unsigned 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 DMULTU 
000000 0000000000 011101 

6 5 5 10 6 
Format: DMULTU fs, rt MIPS Ill 


Purpose: To multiply 64-bit unsigned integers. 


Description: (LO, HI) < rs x rt 


The 64-bit doubleword value in GPR rt is multiplied by the 64-bit value in GPR rs, treating both 
operands as unsigned values, to produce a 128-bit result. The low-order 64-bit doubleword of 
the result is placed into special register LO, and the high-order 64-bit doubleword is placed into 
special register HT. 


No arithmetic exception occurs under any circumstances. 


Restrictions: 


If either of the two preceding instructions is MFHI or MFLO, the result of the MFHI or MFLO 
is undefined. Reads of the H/7 or LO special registers must be separated from subsequent 
instructions that write to them by two or more other instructions. 


Operation: 64-bit processors 
I-2:, l-1: LO, HI < undefined 
I: prod < (0|| GPR[rs}) * (0 || GPR[rt]) 
LO < prodg3 
HI < prod, 97.64 
Exceptions: 


Reserved Instruction 
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Doubleword Shift Left Logical DS L L 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSLL 
000000 00000 111000 
6 5 5 5 5 6 
Format: DSLL rd, rt,sa MIPS Ill 

Purpose: To left shift a doubleword by a fixed amount — 0 to 31 bits. 


Description: rd<rt<<sa 


The 64-bit doubleword contents of GPR rt are shifted left, inserting zeros into the emptied bits; 
the result is placed in GPR rd. The bit shift count in the range 0 to 31 is specified by sa. 


Restrictions: 


None 


Operation: 64-bit processors 


Ss <0||sa 
GPR[rd]< GPR[rt]/63-s)..0 || 0s 


Exceptions: 


Reserved Instruction 
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DSLL32 Doubleword Shift Left Logical Plus 32 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSLL32 
000000 00000 111100 
6 5 5 5 5 6 
Format: DSLL32_ rd, rt, sa MIPS III 

Purpose: To left shift a doubleword by a fixed amount — 32 to 63 bits. 


Description: rd < rt << (sa+32) 


The 64-bit doubleword contents of GPR rt are shifted left, inserting zeros into the emptied bits; 
the result is placed in GPR rd. The bit shift count in the range 32 to 63 is specified by sa+32. 


Restrictions: 


None 


Operation: 64-bit processors 


Ss <+1||sa /* 32+sa */ 
GPR[rd]— GPRIrt]63-s)..0 || 0s 
Exceptions: 


Reserved Instruction 
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Doubleword Shift Left Logical Variable DSLLV 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DSLLV 
000000 00000 010100 
6 5 5 5 5 6 
Format: DSLLV rd, rt, rs MIPS Ill 

Purpose: To left shift a doubleword by a variable number of bits. 


Description: rd <rt<<rs 


The 64-bit doubleword contents of GPR rt are shifted left, inserting zeros into the emptied bits; 
the result is placed in GPR rd. The bit shift count in the range 0 to 63 is specified by the low- 
order six bits in GPR rs. 


Restrictions: 


None 


Operation: 64-bit processors 


s < 0 || GPRirs]5 5 
GPR{[rd]< GPRIrt]63-s)..0 || 0s 


Exceptions: 


Reserved Instruction 
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DS RA Doubleword Shift Right Arithmetic 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSRA 
000000 00000 111011 
6 5 5 5 5 6 
Format: DSRA rd, rt, sa MIPS III 

Purpose: To arithmetic right shift a doubleword by a fixed amount — 0 to 31 bits. 


Description: rd<rt>>sa (arithmetic) 


The 64-bit doubleword contents of GPR rt are shifted right, duplicating the sign bit (63) into the 
emptied bits; the result is placed in GPR rd. The bit shift count in the range 0 to 31 is specified 
by sa. 


Restrictions: 
None 


Operation: 64-bit processors 


Ss <0||sa 
GPR[rd]— (GPR[rt]g3)° || GPR[rt] 63.5 


Exceptions: 


Reserved Instruction 
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Doubleword Shift Right Arithmetic Plus 32 DS RA32 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSRA32 
000000 00000 111111 
6 5 5 5 5 6 
Format: DSRA32_ rd, rt, sa MIPS III 

Purpose: To arithmetic right shift a doubleword by a fixed amount — 32-63 bits. 


Description: rd < rt>>(sa+32) (arithmetic) 


The doubleword contents of GPR rt are shifted right, duplicating the sign bit (63) into the 
emptied bits; the result is placed in GPR rd. The bit shift count in the range 32 to 63 is specified 
by sa+32. 


Restrictions: 


None 


Operation: 64-bit processors 


Ss <+1||sa /* 32+sa */ 
GPR[rd]— (GPR[rt]g3)° || GPR[rt] 63.5 


Exceptions: 


Reserved Instruction 
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DS RAV Doubleword Shift Right Arithmetic Variable 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL rs rt rd 0 DSRAV 
000000 00000 010111 


6 5 5 5 5 6 


Format: DSRAV 1d, rt, rs MIPS III 
Purpose: To arithmetic right shift a doubleword by a variable number of bits. 


Description: rd< rt>>rs (arithmetic) 


The doubleword contents of GPR rt are shifted right, duplicating the sign bit (63) into the 
emptied bits; the result is placed in GPR rd. The bit shift count in the range 0 to 63 is specified 
by the low-order six bits in GPR rs. 


Restrictions: 
None 


Operation: 64-bit processors 


s < GPRIrs]5, 0 
GPR{[rd]<— (GPR[rt]g3)° || GPRirtle3, s 


Exceptions: 


Reserved Instruction 
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Doubleword Shift Right Logical DSRL 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSRL 
000000 00000 111010 
6 5 5 5 5 6 
Format: DSRL rd, rt, sa MIPS III 

Purpose: To logical right shift a doubleword by a fixed amount — 0 to 31 bits. 


Description: rd<rt>>sa_ (logical) 


The doubleword contents of GPR rt are shifted right, inserting zeros into the emptied bits; the 
result is placed in GPR rd. The bit shift count in the range 0 to 31 is specified by sa. 


Restrictions: 


None 


Operation: 64-bit processors 


Ss <0||sa 
GPR{rd]< 0 || GPRirt]g3, 


Exceptions: 


Reserved Instruction 
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DSR L32 Doubleword Shift Right Logical Plus 32 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSRL32 
000000 00000 111110 
6 5 5 5 5 6 
Format: DSRL32_ rd, rt, sa MIPS III 

Purpose: To logical right shift a doubleword by a fixed amount — 32 to 63 bits. 


Description: rd < rt>>(sa+32) (logical) 


The 64-bit doubleword contents of GPR rt are shifted right, inserting zeros into the emptied bits; 
the result is placed in GPR rd. The bit shift count in the range 32 to 63 is specified by sa+32. 


Restrictions: 


None 


Operation: 64-bit processors 


Ss <1||sa /* 32+sa */ 
GPR{[rd]< 0 || GPRirt]g3, s 
Exceptions: 


Reserved Instruction 
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Doubleword Shift Right Logical Variable DSRLV 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DSRLV 
000000 00000 010110 
6 5 5 5 5 6 
Format: DSRLV rd, rt, rs MIPS III 

Purpose: To logical right shift a doubleword by a variable number of bits. 


Description: rd<rt>>rs_ (logical) 


The 64-bit doubleword contents of GPR rt are shifted right, inserting zeros into the emptied bits; 
the result is placed in GPR rd. The bit shift count in the range 0 to 63 is specified by the low- 
order six bits in GPR rs. 


Restrictions: 


None 


Operation: 64-bit processors 


Ss < GPRIrs]5, 0 
GPR[rd]< 08 || GPRIrtleg. 5 


Exceptions: 


Reserved Instruction 
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DS U B Doubleword Subtract 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DSUB 
000000 00000 101110 
6 5 5 5 5 6 
Format: DSUB 1d, rs, rt MIPS Ill 
Purpose: To subtract 64-bit integers; trap if overflow. 


Description: rd<rs- rt 


The 64-bit doubleword value in GPR rt is subtracted from the 64-bit value in GPR rs to produce 
a 64-bit result. If the subtraction results in 64-bit 2’s complement arithmetic overflow then the 
destination register is not modified and an Integer Overflow exception occurs. If it does not 
overflow, the 64-bit result is placed into GPR rd. 

Restrictions: 


None 
Operation: 64-bit processors 

temp < GPRirs] — GPR[rt] 

if (64_bit_arithmetic_overflow) then 
SignalException(IntegerOverflow) 

else 
GPRi{rd] < temp 

endif 


Exceptions: 
Integer Overflow 
Reserved Instruction 


Programming Notes: 


DSUBU performs the same arithmetic operation but, does not trap on overflow. 


Doubleword Subtract Unsigned DSU BU 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DSUBU 
000000 00000 101111 
6 5 5 5 5 6 
Format: DSUBU 1d, rs, rt MIPS III 


Purpose: To subtract 64-bit integers. 


Description: rd<rs-rt 


The 64-bit doubleword value in GPR rt is subtracted from the 64-bit value in GPR rs and the 
64-bit arithmetic result is placed into GPR rd. 


No Integer Overflow exception occurs under any circumstances. 
Restrictions: 


None 


Operation: 64-bit processors 
GPR{[rd] < GPR[rs] — GPR{[rt] 
Exceptions: 


Reserved Instruction 


Programming Notes: 


The term “unsigned” in the instruction name is a misnomer; this operation is 64-bit modulo 
arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, 
such as address arithmetic, or integer arithmetic environments that ignore overflow, such as “C” 
language arithmetic. 
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E R ET Exception Return 


31 26 2524 6 5 0 
COPO CO 0 ERET 
010000 1 000 0000 0000 0000 0000 011000 
6 1 19 6 
Format: ERET MIPS Ill 
Description: 


ERET is the R5000 and the R10000 instruction for returning from an interrupt, exception, or 
error trap. Unlike a branch or jump instruction, ERET does not execute the next instruction. 


ERET must not itself be placed in a branch delay slot. 


If the processor is servicing an error trap (SR» = 1), then load the PC from the ErrorEPC and 
clear the ERL bit of the Status register (SR>). Otherwise (SR» = 0), load the PC from the EPC, 
and clear the EXL bit of the Status register (SR). 


An ERET executed between a LL and SC also causes the SC to fail. 


If there is no exception (EXL=0 and ERL=0 in the Status register), execution of an ERET 
instruction is meaningless. 


Execution of an ERET when ERL=0, regardless of the state of EXL, sets EXL to 0 and a jump 
is taken to the address presently held in the EPC register, even when there is no exception. 


Operation: 


if SRo =1 then 

PC < ErrorEPC 

SR < SRo1__3 || 0 || SRi_o 
else 

PC + EPC 

SR <— SR3}__.2 || 0 || SRo 
endif 
LLbit — 0 


Exceptions: 


Coprocessor unusable 
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Jump J 
31 26 25 0 
J instr_index 
000010 
6 26 
Format: J target MIPS | 


Purpose: To branch within the current 256 MB aligned region. 


Description: 


This is a PC-region branch (not PC-relative); the effective target address is in the “current” 256 
MB aligned region. The low 28 bits of the target address is the instr_index field shifted left 2 
bits. The remaining upper bits are the corresponding bits of the address of the instruction in the 
delay slot (not the branch itself). 


Jump to the effective target address. Execute the instruction following the jump, in the branch 
delay slot, before jumping. 
Restrictions: 
None 
Operation: 
I: 
141: PC © PCepri en. || instr_index || 07 
Exceptions: 


None 


Programming Notes: 


Forming the branch target address by catenating PC and index bits rather than adding a signed 
offset to the PC is an advantage if all program code addresses fit into a 256 MB region aligned 
on a 256 MB boundary. It allows a branch to anywhere in the region from anywhere in the 
region which a signed relative offset would not allow. 


This definition creates the boundary case where the branch instruction is in the last word of a 
256 MB region and can therefore only branch to the following 256 MB region containing the 
branch delay slot. 
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JAL Jump And Link 


31 26 25 0 
JAL instr_index 
000011 
6 26 
Format: JAL target MIPS | 
Purpose: To procedure call within the current 256 MB aligned region. 
Description: 


Place the return address link in GPR 31. The return link is the address of the second instruction 
following the branch, where execution would continue after a procedure call. 


This is a PC-region branch (not PC-relative); the effective target address is in the “current” 256 
MB aligned region. The low 28 bits of the target address is the instr_index field shifted left 2 
bits. The remaining upper bits are the corresponding bits of the address of the instruction in the 
delay slot (not the branch itself). 


Jump to the effective target address. Execute the instruction following the jump, in the branch 
delay slot, before jumping. 

Restrictions: 
None 


Operation: 

I GPR[381]< PC +8 

1+1: PC © PCeprien.o¢ || instr_index || 07 
Exceptions: 


None 


Programming Notes: 


Forming the branch target address by catenating PC and index bits rather than adding a signed 
offset to the PC is an advantage if all program code addresses fit into a 256 MB region aligned 
on a 256 MB boundary. It allows a branch to anywhere in the region from anywhere in the 
region which a signed relative offset would not allow. 


This definition creates the boundary case where the branch instruction is in the last word of a 
256 MB region and can therefore only branch to the following 256 MB region containing the 
branch delay slot. 
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Jump And Link Register JALR 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rd JALR 
000000 00000 00000 001001 
6 5 5 5 5 6 
Format: JALR rs (rd = 31 implied) MIPS | 

JALR rd, rs 
Purpose: To procedure call to an instruction address in a register. 


Description: rd < return_addr, PC < rs 
Place the return address link in GPR rd. The return link is the address of the second instruction 


following the branch, where execution would continue after a procedure call. 


Jump to the effective target address in GPR rs. Execute the instruction following the jump, in 
the branch delay slot, before jumping. 


Restrictions: 


Register specifiers rs and rd must not be equal, because such an instruction does not have the 
same effect when re-executed. The result of executing such an instruction is undefined. This 
restriction permits an exception handler to resume execution by re-executing the branch when 
an exception occurs in the branch delay slot. 


The effective target address in GPR rs must be naturally aligned. If either of the two least- 
significant bits are not -zero, then an Address Error exception occurs, not for the jump 
instruction, but when the branch target is subsequently fetched as an instruction. 


Operation: 
I: temp < GPRirs] 
GPR[rd] — PC + 8 
l+1:PC <— temp 
Exceptions: 


None 


Programming Notes: 


This is the only branch-and-link instruction that can select a register for the return link; all other 
link instructions use GPR 31 The default register for GPR rd, if omitted in the assembly 
language instruction, is GPR 31. 
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Jump Register 


31 26 25 21 20 65 0 
SPECIAL rs JR 
000000 0000000 0000 0000 001000 
6 5 15 6 
Format: JR rs MIPS | 
Purpose: To branch to an instruction address in a register. 


Description: PC < rs 


Jump to the effective target address in GPR rs. Execute the instruction following the jump, in 
the branch delay slot, before jumping. 


Restrictions: 


The effective target address in GPR rs must be naturally aligned. If either of the two least- 
significant bits are not -zero, then an Address Error exception occurs, not for the jump 
instruction, but when the branch target is subsequently fetched as an instruction. 


Operation: 
I temp <— GPRi[rs] 
l+1:PC <— temp 


Exceptions: 


None 
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Load Byte L B 
31 26 25 21 20 16 15 0 
LB base rt offset 
100000 
6 5 5 16 
Format: LB rt, offset(base) MIPS I 
Purpose: To load a byte from memory as a signed value. 


Description: rt < memory[base-+offset] 


The contents of the 8-bit byte at the memory location specified by the effective address are 
fetched, sign-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents 
of GPR base to form the effective address. 


Restrictions: 


None 


Operation: 32-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr <— pAddripgize-1).. 2 || (PAddry_ 9 xor ReverseEndian?) 
memword < LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte — vAddry_9 xor BigEndianCPU? 

GPR[rt] — sign_extend(memword7,*pyte..8*byte) 


Operation: 64-bit processors 
vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
pAddr <— pAddrpgjze_1.3|| (PAddro 9 xor ReverseEndian®) 
memdouble < LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte < vAddrs 9 xor BigEndianCPU® 
GPR[rt] <— sign_extend(memdouble7,*byte..8*byte) 

Exceptions: 
TLB Refill, TLB Invalid 


Address Error 
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LB U Load Byte Unsigned 
31 26 25 21 20 16 15 0 
LBU base rt offset 
100100 
6 5 5 16 
Format: LBU_ rt, offset(base) MIPS I 
Purpose: To load a byte from memory as an unsigned value. 


Description: rt < memory[base-+offset] 


The contents of the 8-bit byte at the memory location specified by the effective address are 
fetched, zero-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents 
of GPR base to form the effective address. 


Restrictions: 


None 


Operation: 32-bit processors 
vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr < pAddrpgize — 4 .. 2 || (pAddr; 9 xor ReverseEndian?) 
memword < LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte — vAddr, 9 xor BigEndianCPU? 
GPR[rt] — zero_extend(memword7,.* pyte..8* byte) 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
pAddr <— pAddrpgjze_1.3 || (PAddro 9 xor ReverseEndian®) 
memdouble < LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte < vAddro 9 xor BigEndianCPU? 

GPR[rt] — zero_extend(memdouble7,¢* pyte..g* byte) 


Exceptions: 
TLB Refill, TLB Invalid 
Address Error 
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Load Doubleword 


LD 


31 26 25 21 20 16 15 0 
LD base rt offset 
110111 
6 5 5 16 
Format: LD tt, offset(base) MIPS Ill 
Purpose: To load a doubleword from memory. 


Description: rt < memory[base-+offset] 


The contents of the 64-bit doubleword at the memory location specified by the aligned effective 
address are fetched and placed in GPR rt. The 16-bit signed offset is added to the contents of 


GPR base to form the effective address. 


Restrictions: 


The effective address must be naturally aligned. If any of the three least-significant bits of the 


address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 3 bits of the offset field must be zero. If they are not, the result of the 


instruction is undefined. 


Operation: 64-bit processors 
vAddr < sign_extend(offset) + GPR[base] 


if (VAddro 9) # 03 then SignalException(AddressError) endif 


(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
memdouble < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 


GPRirt] <— memdouble 
Exceptions: 

TLB Refill, TLB Invalid 

Bus Error 

Address Error 


Reserved Instruction 
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LDCz Load Doubleword to Coprocessor 
31 26 25 21 20 16 15 0 
LDGz base rt offset 
11012zz 
6 5 5 16 
Format: LDC1 rt, offset(base) MIPS Il 


LDC2 rt, offset(base) 
Purpose: To load a doubleword from memory to a coprocessor general register. 


Description: rt ~ memory[base-+offset] 


The contents of the 64-bit doubleword at the memory location specified by the aligned effective 
address are fetched and made available to coprocessor unit zz. The 16-bit signed offset is added 
to the contents of GPR base to form the effective address. 


The manner in which each coprocessor uses the data is defined by the individual coprocessor 
specifications. The usual operation would place the data into coprocessor general register rt. 


Each MIPS architecture level defines up to 4 coprocessor units, numbered 0 to 3 (see 
1.2.5 Coprocessor Instructions). The opcodes corresponding to coprocessors that are not 
defined by an architecture level may be used for other instructions. 


Restrictions: 


Access to the coprocessors is controlled by system software. Each coprocessor has a 
“coprocessor usable” bit in the System Control coprocessor. The usable bit must be set for a 
user program to execute a coprocessor instruction. If the usable bit is not set, an attempt to 
execute the instruction will result in a Coprocessor Unusable exception. An unimplemented 
coprocessor must never be enabled. The result of executing this instruction for an 
unimplemented coprocessor when the usable bit is set, is undefined. 


This instruction is not available for coprocessor 0, the System Control coprocessor, and the 
opcode may be used for other instructions. 


The effective address must be naturally aligned. If any of the three least-significant bits of the 
effective address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 3 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 32-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddro 0) # 03 then SignalException(AddressError) endif 

(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 

memdouble < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
COP_LD (z, rt, memdouble) 
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Load Doubleword to Coprocessor LDCz 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddro 9) # 03 then SignalException(AddressError) endif 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 

memdouble < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
COP_LD (z, rt, memdouble) 


Exceptions: 
TLB Refill, TLB Invalid 
Bus Error 
Address Error 
Reserved Instruction 


Coprocessor Unusable 
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LD L Load Doubleword Left 
31 26 25 21 20 1615 0 
LDL base rt offset 
011010 
6 5 5 16 
Format: LDL rt, offset(base) MIPS Ill 
Purpose: To load the most-significant part of a doubleword from an unaligned memory 
address. 


Description: rt << rt MERGE memory[base-+offset] 


The 16-bit signed offset is added to the contents of GPR base to form an effective address 
(EffAddr). EffAddr is the address of the most-significant of eight consecutive bytes forming a 
doubleword in memory (DW) starting at an arbitrary byte boundary. A part of DW, the most- 
significant one to eight bytes, is in the aligned doubleword containing EffAddr. This part of DW 
is loaded appropriately into the most-significant (left) part of GPR rt leaving the remainder of 
GPR rt unchanged. 


The figure below illustrates this operation for big-endian byte ordering. The eight consecutive 
bytes in 2..9 form an unaligned doubleword starting at location 2. A part of DW, six bytes, is 
contained in the aligned doubleword containing the most-significant byte at 2. First, LDL loads 
these six bytes into the left part of the destination register and leaves the remainder of the 
destination unchanged. Next, the complementary LDR loads the remainder of the unaligned 
doubleword. 


Doubleword at byte 2 in memory, big-endian byte order, - each mem byte contains its address 


most —significance— least 
0;1;2;);3;);4;)5)]6)] 7478 | 9 | 10) 11 | 12 | 13) 14) 15 Memory 


GPR 24: Initial contents 


After executing LDL $24,2 ($0) 


Then after LDR $24,9($0) 


Figure 1-2 Unaligned Doubleword Load using LDL and LDR 


Chapter 1 CPU Instruction Set 


Load Doubleword Left LD L 


The bytes loaded from memory to the destination register depend on both the offset of the 
effective address within an aligned doubleword, i.e. the low three bits of the address (vAddr, 0), 
and the current byte ordering mode of the processor (big- or little-endian). The table below 
shows the bytes loaded for every combination of offset and byte ordering. 


Table 1-33 Bytes Loaded by LDL Instruction 


Memory contents and byte offsets (vAddrp 9) Initial contents of 
most —significance— least Destination Register 
012 3 4 5 6 7 €&big- most —significance— least 
1;}J;/K/L/M/N/O/P a}lb|/c/|d|e/f|gj]h 


7 6 5 4 3 2 1 =O €little-endian offset 
Destination register contents after instruction (shaded is unchanged) 


Big-endian byte ordering vAddrp 9 _Little-endian byte ordering 


| JK LE MNO P| O |P/]b c deftgih 
J L MNO PJh 1 O Pic def gh 
K L MNO Pilg h 2 NO P/ld e f g h 
L MNO P|f gh 3 MNO Ple f g h 
MNO Ple f g h 4 L MNO P|f gh 
NO P/ld e f g h 5 K L MNO Pilg h 
O Pic d f gih 6 JK LMN O P/[h 
Pj/b c def gih 7 1 JK LM NOP 
Restrictions: 
None 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr < pAddr(psize-1)..3 || (PAddrs. 9 xor ReverseEndian’) 
if BigEndianMem = 0 then 
pAddr <— pAddripsize-t)..3 || 0° 
endif 
byte < vAddrs 9 xor BigEndianCPU? 
memdouble < LoadMemory (uncached, byte, pAddr, vAddr, DATA) 
GPR[rt] — memdouble7,¢*byte..o || GPR[rt]55-s*byte..0 


Exceptions: 
TLB Refill, TLB Invalid 
Bus Error 
Address Error 


Reserved Instruction 
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LD R Load Doubleword Right 
31 26 25 21 20 16 15 0 
LDR base rt offset 
011011 
6 5 5 16 
Format: LDR rt, offset(base) MIPS Ill 
Purpose: To load the least-significant part of a doubleword from an unaligned memory 
address. 


Description: rt < rt MERGE memory[base-+offset] 


The 16-bit signed offset is added to the contents of GPR base to form an effective address 
(EffAddr). EffAddr is the address of the least-significant of eight consecutive bytes forming a 
doubleword in memory (DW) starting at an arbitrary byte boundary. A part of DW, the least- 
significant one to eight bytes, is in the aligned doubleword containing EffAddr. This part of DW 
is loaded appropriately into the least-significant (right) part of GPR rt leaving the remainder of 
GPR rt unchanged. 


The figure below illustrates this operation for big-endian byte ordering. The eight consecutive 
bytes in 2..9 form an unaligned doubleword starting at location 2. A part of DW, two bytes, is 
contained in the aligned doubleword containing the least-significant byte at 9. First, LDR loads 
these two bytes into the right part of the destination register and leaves the remainder of the 
destination unchanged. Next, the complementary LDL loads the remainder of the unaligned 
doubleword. 


Doubleword at byte 2 in memory, big-endian byte order, - each mem byte contains its address 


most —significance— least 
0o;1;2;);3);4;]5)]6)] 7478 | 9 | 10} 11 | 12 | 13) 14) 15 Memory 


GPR 24: Initial contents 


After executing LDR $24,9($0) 


Then after LDL $24,2($0) 


Figure 1-3 Unaligned Doubleword Load using LDR and LDL 
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Load Doubleword Right LDR 


The bytes loaded from memory to the destination register depend on both the offset of the 
effective address within an aligned doubleword, i.e. the low three bits of the address (vAddr, 0), 
and the current byte ordering mode of the processor (big- or little-endian). The table below 
shows the bytes loaded for every combination of offset and byte ordering. 


Table 1-34 Bytes Loaded by LDR Instruction 


Memory contents and byte offsets (vAddrp 9) Initial contents of 
most —significance— least Destination Register 
012 3 4 5 6 7 €big- most —significance— least 
1); J;/K|}L|M|N/;O/|}P a|/b/|c|d/|ej/f]|gfh 


7 6 5 4 3 2 1 =O €little-endian offset 
Destination register contents after instruction (shaded is unchanged) 


Big-endian byte ordering vAddrs 9 _Little-endian byte ordering 


a bcdee ft gjt 0 | JK LM N O P 
a b cde f J 1 all J K LM N O 
a bcdejs;!I J K 2 a b/I J K LM N 
a bec djl J KL 3 a bc]! J K LM 
a bo c/!I J K LM 4 a boedjs|I J K L 
a b/I J K L MN 5 a bc dejsjI J K 
all J K LM N O 6 a bcdefjl J 
I JK LM N O P 7 a bcde ft gjt 
Restrictions: 
None 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr < pAddr(psize-1)..3 || (PAddre 9 xor ReverseEndian®) 
if BigEndianMem = 1 then 
pAddr — pAddripsize-t)..3 || 0° 
endif 
byte — vAddrs 9 xor BigEndianCPU? 
memdouble < LoadMemory (uncached, byte, pAddr, vAddr, DATA) 
GPR[rt] — GPRI[rt]e3..64-8*byte || Memdoublegs, byte 


Exceptions: 
TLB Refill, TLB Invalid 
Bus Error 
Address Error 


Reserved Instruction 
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LH Load Halfword 
31 26 25 21 20 16 15 0 
LH base rt offset 
100001 
6 5 5 16 
Format: LH tt, offset(base) MIPS I 
Purpose: To load a halfword from memory as a signed value. 


Description: rt < memory[base-+offset] 


The contents of the 16-bit halfword at the memory location specified by the aligned effective 
address are fetched, sign-extended, and placed in GPR rt. The 16-bit signed offset is added to 
the contents of GPR base to form the effective address. 


Restrictions: 


The effective address must be naturally aligned. If the least-significant bit of the address is non- 
zero, an Address Error exception occurs. 


MIPS IV: The low-order bit of the offset field must be zero. If it is not, the result of the 
instruction is undefined. 


Operation: 32-bit processors 


vAddr < sign_extend(offset) + GPR[base] 
if (vAddrg) # 0 then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
pAddr — pAddrpgize — 4.2 || (pAddr; 9 xor (ReverseEndian || 0)) 
memword < LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte + vAddr, 9 xor (BigEndianCPU || 0) 
GPRi[rt] + sign_extend(memword15,.8*pyte..8* byte) 
Operation: 64-bit processors 
vAddr < sign_extend(offset) + GPR[base] 
if (vAddro) # 0 then SignalException(AddressError) endif 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr <— pAddrpgize — 1.3 || (PAddrs 9 xor (ReverseEndian || 0)) 
memdouble < LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 


byte < vAddro 9 xor (BigEndianCPU2 || 0) 
GPR[rt] — sign_extend(memdouble45,.8*byte..8* byte) 


Exceptions: 
TLB Refill , TLB Invalid 
Bus Error 


Address Error 
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Load Halfword Unsigned LH U 
31 26 25 21 20 16 15 0 
LHU base rt offset 
100101 
6 5 5 16 
Format: LHU tt, offset(base) MIPS I 
Purpose: To load a halfword from memory as an unsigned value. 


Description: rt < memory[base-+offset] 


The contents of the 16-bit halfword at the memory location specified by the aligned effective 
address are fetched, zero-extended, and placed in GPR rt. The 16-bit signed offset is added to 
the contents of GPR base to form the effective address. 


Restrictions: 


The effective address must be naturally aligned. If the least-significant bit of the address is non- 
zero, an Address Error exception occurs. 


MIPS IV: The low-order bit of the offset field must be zero. If it is not, the result of the 
instruction is undefined. 


Operation: 32-bit processors 
vAddr < sign_extend(offset) + GPR[base] 
if (vAddrg) # 0 then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
pAddr — pAddrpgize — 4.2 || (pAddr; 9 xor (ReverseEndian || 0)) 
memword < LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte + vAddr; 9 xor (BigEndianCPU || 0) 
GPR[rt] <— zero_extend(memword15,,8*pyte..s*byte) 
Operation: 64-bit processors 
vAddr < sign_extend(offset) + GPR[base] 
if (vAddro) # 0 then SignalException(AddressError) endif 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr — pAddrpgize — 1.3 || (PAddrs 9 xor (ReverseEndian@ || 0)) 
memdouble < LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte < vAddro 9 xor (BigEndianCPU? || 0) 
GPR[rt] — zero_extend(memdouble15,.8*byte..8*byte) 


Exceptions: 
TLB Refill, TLB Invalid 
Address Error 
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LL Load Linked Word 
31 26 25 21 20 16 15 0 
LL base rt offset 
110000 
6 5 5 16 
Format: LL rt, offset(base) MIPS Il 
Purpose: To load a word from memory for an atomic read-modify-write. 


Description: rt ~ memory[base-+offset] 


The LL and SC instructions provide primitives to implement atomic Read-Modify-Write 
(RMW) operations for cached memory locations. 


The 16-bit signed offset is added to the contents of GPR base to form an effective address. 


The contents of the 32-bit word at the memory location specified by the aligned effective 
address are fetched, sign-extended to the GPR register length if necessary, and written into 
GPR rt. This begins a RMW sequence on the current processor. 


There is one active RMW sequence per processor. When an LL is executed it starts the active 
RMW sequence replacing any other sequence that was active. 


The RMW sequence is completed by a subsequent SC instruction that either completes the 
RMW sequence atomically and succeeds, or does not and fails. See the description of SC for a 
list of events and conditions that cause the SC to fail and an example instruction sequence using 
LL and SC. 


Executing LL on one processor does not cause an action that, by itself, would cause an SC for 
the same block to fail on another processor. 


An execution of LL does not have to be followed by execution of SC; a program is free to 
abandon the RMW sequence without attempting a write. 


Restrictions: 


The addressed location must be cached; if it is not, the result is undefined (see 1.6 Memory 
Access Types). 


The effective address must be naturally aligned. If either of the two least-significant bits of the 
effective address are non-zero an Address Error exception occurs. 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 
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Load Linked Word LL 


Operation: 32-bit processors 
vAddr < sign_extend(offset) + GPR[base] 
if (vAddr; 0) # 07 then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
memword < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
GPR[rt] — memword 
LLbit < 1 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddr; 0) # 0° then SignalException(AddressError) endif 

(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr < pAddrpgjze-1..3 || (PAddrs 9 xor (ReverseEndian || 0°)) 
memdouble < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte — vAddro 9 xor (BigEndianCPU || 0°) 

GPR[rt] <— sign_extend(memdouble3}.8*byte..g*byte) 

LLbit < 1 


Exceptions: 
TLB Refill, TLB Invalid 
Address Error 


Reserved Instruction 
Programming Notes: 

There is no Load Linked Word Unsigned operation corresponding to Load Word Unsigned. 
Implementation Notes: 


An LL on one processor must not take action that, by itself, would cause an SC for the same 
block on another processor to fail. If an implementation depends on retaining the data in cache 
during the RMW sequence, cache misses caused by LL must not fetch data in the exclusive 
state, thus removing it from the cache, if it is present in another cache. 
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LLD Load Linked Doubleword 
31 26 25 21 20 16 15 0 
LLD base rt offset 
110100 
6 5 5 16 
Format: LLD rt, offset(base) MIPS Ill 
Purpose: To load a doubleword from memory for an atomic read-modify-write. 


Description: rt < memory[base-+offset] 
The LLD and SCD instructions provide primitives to implement atomic Read-Modify-Write 
(RMW) operations for cached memory locations. 


The 16-bit signed offset is added to the contents of GPR base to form an effective address. 


The contents of the 64-bit doubleword at the memory location specified by the aligned effective 
address are fetched and written into GPR rt. This begins a RMW sequence on the current 
processor. 


There is one active RMW sequence per processor. When an LLD is executed it starts the active 
RMW sequence replacing any other sequence that was active. 


The RMW sequence is completed by a subsequent SCD instruction that either completes the 
RMW sequence atomically and succeeds, or does not and fails. See the description of SCD for 
a list of events and conditions that cause the SCD to fail and an example instruction sequence 
using LLD and SCD. 


Executing LLD on one processor does not cause an action that, by itself, would cause an SCD 
for the same block to fail on another processor. 


An execution of LLD does not have to be followed by execution of SCD; a program is free to 
abandon the RMW sequence without attempting a write. 
Restrictions: 


The addressed location must be cached; if it is not, the result is undefined (see 1.6 Memory 
Access Types). 


The effective address must be naturally aligned. If either of the three least-significant bits of 
the effective address are non-zero an Address Error exception occurs. 


MIPS IV: The low-order 3 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Load Linked Doubleword LLD 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddro 9) # 03 then SignalException(AddressError) endif 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 

memdouble < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
GPRirt] < memdouble 

LLbit < 1 


Exceptions: 
TLB Refill, TLB Invalid 
Address Error 


Reserved Instruction 
Programming Notes: 


Implementation Notes: 


An LLD on one processor must not take action that, by itself, would cause an SCD for the same 
block on another processor to fail. If an implementation depends on retaining the data in cache 
during the RMW sequence, cache misses caused by LLD must not fetch data in the exclusive 
state, thus removing it from the cache, if it is present in another cache. 
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LUI Load Upper Immediate 
31 26 25 21 20 16 15 0 
LUI 0 rt immediate 
001111 00000 
6 5 5 16 
Format: LUI rt, immediate MIPS | 
Purpose: To load a constant into the upper half of a word. 


Description: rt — immediate || 0'° 


The 16-bit immediate is shifted left 16 bits and concatenated with 16 bits of low-order zeros. 
The 32-bit result is sign-extended and placed into GPR rt. 


Restrictions: 


None 


Operation: 
GPRIr] < sign_extend(immediate || 0') 


Exceptions: 


None 
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Load Word LW 


31 26 25 21 20 16 15 0 
LW base rt offset 
100011 
6 5 5 16 
Format: LW tt, offset(base) MIPS I 
Purpose: To load a word from memory as a signed value. 


Description: rt ~ memory[base-+offset] 


The contents of the 32-bit word at the memory location specified by the aligned effective 
address are fetched, sign-extended to the GPR register length if necessary, and placed in GPR rt. 
The 16-bit signed offset is added to the contents of GPR base to form the effective address. 


Restrictions: 


The effective address must be naturally aligned. If either of the two least-significant bits of the 
address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 32-bit processors 
vAddr < sign_extend(offset) + GPR[base] 
if (VAddry 9) # 0? then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
memword < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
GPR[rt] — memword 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddr;_9) # 07 then SignalException(AddressError) endif 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
pAddr < pAddrpgjze-1..3 || (PAddrs 9 xor (ReverseEndian || 0°)) 
memdouble < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte — vAddro 9 xor (BigEndianCPU || 0°) 

GPR[rt] <— sign_extend(memdouble31.8*byte..8*byte) 


Exceptions: 
TLB Refill, TLB Invalid 
Bus Error 


Address Error 
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LWCz Load Word To Coprocessor 
31 26 25 21 20 16 15 0 
LWGz base rt offset 
1100zz 
6 5 5 16 
Format: LWC1_ rt, offset(base) MIPS | 


LWC2 rt, offset(base) 
LWC3 rt, offset(base) 


Purpose: To load a word from memory to a coprocessor general register. 


Description: rt < memory[base-+offset] 


The contents of the 32-bit word at the memory location specified by the aligned effective 
address are fetched and made available to coprocessor unit zz. The 16-bit signed offset is added 
to the contents of GPR base to form the effective address. 


The manner in which each coprocessor uses the data is defined by the individual coprocessor 
specification. The usual operation would place the data into coprocessor general register rt. 


Each MIPS architecture level defines up to 4 coprocessor units, numbered 0 to 3 (see 
1.2.5 Coprocessor Instructions). The opcodes corresponding to coprocessors that are not 
defined by an architecture level may be used for other instructions. 


Restrictions: 


Access to the coprocessors is controlled by system software. Each coprocessor has a 
“coprocessor usable” bit in the System Control coprocessor. The usable bit must be set for a 
user program to execute a coprocessor instruction. If the usable bit is not set, an attempt to 
execute the instruction will result in a Coprocessor Unusable exception. An unimplemented 
coprocessor must never be enabled. The result of executing this instruction for an 
unimplemented coprocessor when the usable bit is set, is undefined. 


This instruction is not available for coprocessor 0, the System Control coprocessor, and the 
opcode may be used for other instructions. 


The effective address must be naturally aligned. If either of the two least-significant bits of the 
address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 32-bit processors 
I: vAddr < sign_extend(offset) + GPRibase] 
if (vAddr; 9) # 07 then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
memword <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
+1: COP_LW (z, rt, memword) 
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Load Word To Coprocessor LWCz 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base} 

if (vAddr; 0) # 0? then SignalException(AddressError) endif 

(pAddr, uncached)< AddressTranslation (vAddr, DATA, LOAD) 

pAddr <— pAddrpgjze-1.3 || (PAddre 9 xor (ReverseEndian || 0°) 

memdouble < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
byte + vAddro 9 xor (BigEndianCPU || 07) 

memword < memdouble31+8*byte..8*byte 

COP_LW (z, rt, memdouble) 


Exceptions: 
TLB Refill, TLB Invalid 
Bus Error 
Address Error 


Coprocessor Unusable 
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LWL Load Word Left 


31 26 25 21 20 16 15 0 
LWL base rt offset 
100010 
6 5 5 16 
Format: LWL_ rt, offset(base) MIPS | 
Purpose: To load the most-significant part of a word as a signed value from an unaligned 


memory address. 


Description: rt < rt MERGE memory[base-+offset] 


The 16-bit signed offset is added to the contents of GPR base to form an effective address 
(EffAddr). EffAddr is the address of the most-significant of four consecutive bytes forming a 
word in memory (W) starting at an arbitrary byte boundary. A part of W, the most-significant 
one to four bytes, is in the aligned word containing EffAddr. This part of W is loaded into the 
most-significant (left) part of the word in GPR rt. The remaining least-significant part of the 
word in GPR rt is unchanged. 


If GPR rt is a 64-bit register, the destination word is the low-order word of the register. The 
loaded value is treated as a signed value; the word sign bit (bit 31) is always loaded from 
memory and the new sign bit value is copied into bits 63..32. 


Word at byte 2 in memory, big-endian byte order, - each mem byte contains its address 
most | - significance - | least 
0;1/;2);374)5)|]6|7 | 8/9 S Memory initial contents 


e|ffigfh 32-bit GPR 24: Initial contents 
alb;|c/;}d/ej;fj|gyjh 64-bit GPR 24 


gi h After executing LWL $24,2($0) 


sign bit (31) extend] 2 | 3 | g | h 


3 5 Then after LWR $24,5($0) 
sign bit (31) extend| 2 | 3 | 4 | 5 


Figure 1-4 Unaligned Word Load using LWL and LWR 
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Load Word Left 


LWL 


The figure above illustrates this operation for big-endian byte ordering for 32-bit and 64-bit 

registers. The four consecutive bytes in 2..5 form an unaligned word starting at location 2. A 
part of W, two bytes, is in the aligned word containing the most-significant byte at 2. First, LWL 
loads these two bytes into the left part of the destination register word and leaves the right part 
of the destination word unchanged. Next, the complementary LWR loads the remainder of the 


unaligned word. 


The bytes loaded from memory to the destination register depend on both the offset of the 
effective address within an aligned word, 1.e. the low two bits of the address (vAddr, 9), and the 
current byte ordering mode of the processor (big- or little-endian). The table below shows the 
bytes loaded for every combination of offset and byte ordering. 


Table 1-35 Bytes Loaded by LWL Instruction 


Memory contents and byte offsets 
0 1 2 3 €big-endian 
1) J)KYIL offset (vAddr, 9) 
3 2 1 = O <€little-endian 
most least 


— significance — 


Initial contents of Dest Register 
64-bit register 


a|/b}|c|]d}]}ejf}g]h 


most — significance — least 


32-bit register | e | f | g | h 


Destination 64-bit register contents after instruction (shaded is unchanged) 


Big-endian byte ordering vAddry_ 9 Little-endian byte ordering 
sign bit (31) extended) | J K L 0 sign bit (31) extended) L | f gh 
sign bit (31) extended) J K L/h 1 sign bit (31) extended) K L/ gh 
sign bit (31) extended) K L/ gh 2 sign bit (31) extended) J K L/h 
sign bit (31) extended) L | f gh 3 sign bit (31) extended) | J K L 
The word sign (31) is always loaded and the value is copied into bits 63..32. 

32-bit register Big-endian vAddry_o0 Little-endian 
I J KL 0 L|f g oh 
J eK Lih 1 K Lig i h 
K Lig h 2 J eK Linh 
Lif g oh 3 I J KL 


The unaligned loads, LWL and LWR, are exceptions to the load-delay scheduling restriction in 
the MIPS I architecture. An unaligned load instruction to GPR rt that immediately follows 
another load to GPR rt can “read” the loaded data. It will correctly merge the 1 to 4 loaded bytes 


with the data loaded by the previous instruction. 
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LWL Load Word Left 


Restrictions: 


MIPS I scheduling restriction: The loaded data is not available for use by the following 
instruction. The instruction immediately following this one, unless it is an unaligned load 
(LWL, LWR), may not use GPR rt as a source register. If this restriction is violated, the result 
of the operation is undefined. 


Operation: 32-bit processors 
vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
pAddr < pAddr(psize-1)..2 || (PAddry. 9 xor ReverseEndian?) 
if BigEndianMem = 0 then 
pAddr — pAddrpgize-t)..2 || 0° 
endif 
byte « vAddry_9 xor BigEndianCPU? 
memword < LoadMemory (uncached, byte, pAddr, vAddr, DATA) 
GPR[rt] — memword7,¢*byte..o || GPR[rt]23-s*byte..0 


Operation: 64-bit processors 
vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr < pAddr(psize-1)..3 || (PAddre 9 xor ReverseEndian’) 
if BigEndianMem = 0 then 
pAddr — pAddr psize-t)..3 || 0° 
endif 
byte <— 0 || (vAddr; 9 xor BigEndianCPU?) 
word <— vAddro xor BigEndianCPU 
memdouble < LoadMemory (uncached, byte, pAddr, vAddr, DATA) 


temp <— memdoubles; ,39*word-8*byte..32*word || GPR[tt]23-g*byte..0 
GPRi[rt] + (temps4)°* || temp 


Exceptions: 
TLB Refill, TLB Invalid 
Bus Error 


Address Error 


Programming Notes: 


The architecture provides no direct support for treating unaligned words as unsigned values, i.e. 
zeroing bits 63..32 of the destination register when bit 31 is loaded. See SLL or SLLV for a 
single-instruction method of propagating the word sign bit in a register into the upper half of a 
64-bit register. 
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Load Word Right LWR 


31 26 25 21 20 16 15 0 
LWR base rt offset 
100110 
6 5 5 16 
Format: LWR._ tt, offset(base) MIPS | 
Purpose: To load the least-significant part of a word from an unaligned memory address as 


a signed value. 


Description: rt < rt MERGE memory[base-+offset] 


The 16-bit signed offset is added to the contents of GPR base to form an effective address 
(EffAddr). EffAddr is the address of the least-significant of four consecutive bytes forming a 
word in memory (W) starting at an arbitrary byte boundary. A part of W, the least-significant 
one to four bytes, is in the aligned word containing EffAddr. This part of W is loaded into the 
least-significant (right) part of the word in GPR rt. The remaining most-significant part of the 
word in GPR rt is unchanged. 


If GPR rt is a 64-bit register, the destination word is the low-order word of the register. The 
loaded value is treated as a signed value; if the word sign bit (bit 31) is loaded (i.e. when all four 
bytes are loaded) then the new sign bit value is copied into bits 63..32. If bit 31 is not loaded 
then the value of bits 63..32 is implementation dependent; the value is either unchanged or a 
copy of the current value of bit 31. Executing both LWR and LWL, in either order, delivers in 
a sign-extended word value in the destination register. 


Word at byte 2 in memory, big-endian byte order, - each mem byte contains its address 
most | - significance - | least 
of/1|2|/3]4]5/6)]7 | 8 | 9% Memory initial contents 


e|ff{gfh 32-bit GPR 24: Initial contents 
aljb/|c/}d/ej;fj|gyjh 64-bit GPR 24 

e| f|4 {5 After executing LWR $24,5($0) 
nocngorsignext} e | f | 4] 5 


2;3/)4;5 Then after LWL $24, 2 ($0) 
sign bit (31) extend] 2 | 3 | 4] 5 


Figure 1-5 Unaligned Word Load using LWR and LWL 
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LWR 


Load Word Right 


The figure above illustrates this operation for big-endian byte ordering for 32-bit and 64-bit 
registers. The four consecutive bytes in 2..5 form an unaligned word starting at location 2. A 
part of W, two bytes, is in the aligned word containing the least-significant byte at 5. First, LWR 
loads these two bytes into the right part of the destination register. Next, the complementary 


LWL loads the remainder of the unaligned word. 


The bytes loaded from memory to the destination register depend on both the offset of the 
effective address within an aligned word, i.e. the low two bits of the address (vAddr, ¢), and the 
current byte ordering mode of the processor (big- or little-endian). The table below shows the 
bytes loaded for every combination of offset and byte ordering. 


Table 1-36 Bytes Loaded by LWR Instruction 


Memory contents and byte offsets 
0 1 2 3 €big-endian 
I} J|KUL offset (vAddr,_ 9) 
3 2 1 O <€little-endian 
most least 


— significance — 


Initial contents of Dest Register 
64-bit register 


a|/b}|c|]dj}]}ejf}g]h 


most — significance — least 


32-bit register | e | f | g | h 


Destination 64-bit register contents after instruction (shaded is unchanged) 


Big-endian byte ordering vAddry_ 9 
Nocngorsign-extend e f g | | 0 
Nocngorsign-extendd e f | I J 1 
Nocngorsign-extend) e | | J K 2 
sign bit(31)extended| | J K L 3 


When the word sign bit (31) is loaded, its value is copied into bits 63..32. When it 
is not loaded, the behavior is implementation specific. Bits 63..32 are either 
unchanged or a the value of the unloaded bit 31 is copied into them. 


Little-endian byte ordering 


sign bit (31)extended| | J K L 
Nocngorsign-extend e | | J K 
Nocngorsign-extend e f | I J 
Nocngorsign-extendd e f g] 1 


32-bit register big-endian vAddry_9 
e f gl] 0 
e f|]|lI J 1 
e/|/Il J K 2 
I J KL 3 


little-endian 
I J KL 
e|I J K 
e fl/lIl J 
e f g]I 


The unaligned loads, LWL and LWR, are exceptions to the load-delay scheduling restriction in 
the MIPS I architecture. An unaligned load to GPR rt that immediately follows another load to 
GPR rt can “read” the loaded data. It will correctly merge the 1 to 4 loaded bytes with the data 


loaded by the previous instruction. 


Chapter 1 CPU Instruction Set 


Load Word Right LWR 


Restrictions: 


MIPS I scheduling restriction: The loaded data is not available for use by the following 
instruction. The instruction immediately following this one, unless it is an unaligned load 
(LWL, LWR), may not use GPR rt as a source register. If this restriction is violated, the result 
of the operation is undefined. 


Restrictions: 


None 


Operation: 32-bit processors 


vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr < pAddr(psize-1)..2 || (Addr. 9 xor ReverseEndian?) 
if BigEndianMem = 0 then 
pAddr — pAddripsize-t)..2 || 0° 
endif 
byte — vAddry_9 xor BigEndianCPU? 
memword <- LoadMemory (uncached, byte, pAddr, vAddr, DATA) 
GPR[rt] — memwords1_ 32-8*byte || GPR[rtl31-s*byte..0 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr <— pAddripgize-1)..3 || (PAddra_9 xor ReverseEndian®) 
if BigEndianMem = 1 then 
pAddr — pAddripsize-t)..3 || 0° 
endif 
byte — vAddry_9 xor BigEndianCPU? 
word < vAddro xor BigEndianCPU 
memdouble < LoadMemory (uncached, 0 || byte, pAddr, vAddr, DATA) 


temp < GPRITt]31_ 32-8*byte || Memdoubles,,.3>*word..32*word+8*byte 
if byte = 4 then 


utemp <— (temp31)°" /* loaded bit 31, must sign extend */ 
else 
one of the following two behaviors: 
utemp <— GPRIrt]g3, 32 /* leave what was there alone */ 
utemp <— (GPR[rt]31)°° /* sign-extend bit 31 */ 
endif 


GPR[rt] < utemp || temp 
Exceptions: 

TLB Refill, TLB Invalid 

Bus Error 


Address Error 
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LWR Load Word Right 


Programming Notes: 


The architecture provides no direct support for treating unaligned words as unsigned values, 1.e. 
zeroing bits 63..32 of the destination register when bit 31 is loaded. See SLL or SLLV for a 
single-instruction method of propagating the word sign bit in a register into the upper half of a 
64-bit register. 
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Load Word Unsigned 


LWU 


31 26 25 21 20 16 15 0 
LWU base rt offset 
100111 
6 5 5 16 
Format: LWU_ tt, offset(base) MIPS Ill 
Purpose: To load a word from memory as an unsigned value. 


Description: rt < memory[base-+offset] 


The contents of the 32-bit word at the memory location specified by the aligned effective 
address are fetched, zero-extended, and placed in GPR rt. The 16-bit signed offset is added to 


the contents of GPR base to form the effective address. 


Restrictions: 


The effective address must be naturally aligned. If either of the two least-significant bits of the 


address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 


instruction is undefined. 


Operation: 64-bit processors 
vAddr < sign_extend(offset) + GPR[base] 


if (VAddr, 9) # 0? then SignalException(AddressError) endif 


(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
pAddr — pAddrpgijze-1..3 || (pBAddrs._ 9 xor (ReverseEndian || 0°)) 
memdouble < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 


byte + vAddro 9 xor (BigEndianCPU || 07) 
GPRi[rt] + 082 || memdouble3; .g*pyte..8*byte 


Exceptions: 
TLB Refill, TLB Invalid 
Bus Error 
Address Error 


Reserved Instruction 
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MFCO Move From System Control Coprocessor 
31 26 25 21 20 16 15 11 10 0 
COPO MF rt rd 0 
010000 | 00000 000 0000 0000 
6 5 5 5 11 
Format: MFCO rt, rd MIPS | 
Description: 


The contents of coprocessor register rd of the CPO are loaded into general register rt. 


Operation: 32-bit processors 
data < CPR[0,rd] 
GPRirt] «+ data 
Operation: 64-bit processors 
data + CPR[0,rd] 
GPR[rt] < (data3,)°? || datas; 9 
Exceptions: 


Coprocessor unusable 
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Move From HI Register MFHI 
31 26 25 16 15 11 10 6 5 0 
SPECIAL 0 rd 0 MFHI 
000000 |00 0000 0000 00000 010000 
6 10 5 5 6 
Format: MFHI rd MIPS | 

Purpose: To copy the special purpose HI register to a GPR. 


Description: rd< Hl 
The contents of special register HJ are loaded into GPR rd. 
Restrictions: 


The two instructions that follow an MFHI instruction must not be instructions that modify the 
HI register: DDIV, DDIVU, DIV, DIVU, DMULT, DMULTU, MTHI, MULT, MULTU. If 
this restriction is violated, the result of the MFHI is undefined. 


Operation: 
GPR[rd] < HI 


Exceptions: 


None 
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MFLO Move From LO Register 


31 26 25 16 15 11 10 6 5 0 
SPECIAL 0 rd 0 MFLO 
000000 |00 0000 0000 00000 010010 
6 10 5 5 6 
Format: MFLO rd MIPS | 
Purpose: To copy the special purpose LO register to a GPR. 


Description: rd<LO 
The contents of special register LO are loaded into GPR rd. 


Restrictions: 


The two instructions that follow an MFLO instruction must not be instructions that modify the 
LO register: DDIV, DDIVU, DIV, DIVU, DMULT, DMULTU, MTLO, MULT, MULTU. If 
this restriction is violated, the result of the MFLO is undefined. 


Operation: 
GPR[rd] — LO 


Exceptions: 


None 
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Move From Performance Counter (R10000 only) MFPC 
31 26 25 21 20 16 15 1110 6 5 10 
COPO 00000 rt 11001 00000 reg 1 
010000 
6 5 5 5 5 5 1 
Format: MFPC rt, reg 
Description: 


The contents of a performance counter reg of the CPO are loaded into general register rt. Only 
0 and | are valid for reg in the R10000 implementation. 
Operation: 32-bit processors 
data < CPR[0,reg] 
GPRirt] «+ data 
Operation: 64-bit processors 
data < CPR[0,reg] 
GPR[rt] — (datas1)° || datas1_9 
Exceptions: 


Coprocessor Unusable 
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MFPS (R10000 only) Move From Performance Event Specifier 


31 26 25 21 20 16 15 1110 6 5 10 
COPO 00000 rt 11001 00000 reg 0 
010000 
6 5 5 5 5 5 1 
Format: MFPS rt, reg 
Description: 


The contents of a performance event specifier reg of the CPO are loaded into general register rt. 
Only 0 and 1 are valid for reg in the R10000 implementation. 
Operation: 32-bit processors 
data < CPR[0,reg] 
GPRirt] «+ data 
Operation: 64-bit processors 
data < CPR[0,reg] 
GPR[rt] — (datas1)° || datas1_9 
Exceptions: 


Coprocessor Unusable 
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Move Conditional on Not Zero MOVN 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 MOVN 
000000 00000 001011 
6 5 5 5 5 6 
Format: MOVN 1d, rs, rt MIPS IV 
Purpose: To conditionally move a GPR after testing a GPR value. 


Description: if (rt +0) then rd < rs 


If the value in GPR rt is not equal to zero, then the contents of GPR rs are placed into GPR rd. 


Restrictions: 
None 


Operation: 


if GPR[rt] # 0 then 
GPRi[rd] < GPR{rs] 
endif 


Exceptions: 
Reserved Instruction 


Programming Notes: 


The nonzero value tested here is the “condition true” result from the SLT, SLTI, SLTU, and 
SLTIU comparison instructions. 
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MOVZ Move Conditional on Zero 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 MOVZ 
000000 00000 001010 

6 5 5 5 5 6 
Format: MOVZ _ rd, rs, rt MIPS IV 
Purpose: To conditionally move a GPR after testing a GPR value. 


Description: if (rt = 0) then rd < rs 

If the value in GPR rt is equal to zero, then the contents of GPR rs are placed into GPR rd. 
Restrictions: 

None 


Operation: 


if GPR[rt] = 0 then 
GPR[rd] < GPR{rs] 
endif 


Exceptions: 
Reserved Instruction 


Programming Notes: 


The zero value tested here is the “condition false” result from the SLT, SLTI, SLTU, and SLTIU 
comparison instructions. 
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Move To System Control Coprocessor MTCO 
31 26 25 21 20 16 15 11 10 0 
COPO MT rt rd 0 
010000 00100 000 0000 0000 
6 5 5 5 11 
Format: MTCO rt, rd MIPS | 
Description: 


The contents of general register rt are loaded into coprocessor register rd of CPO. 


Operation: 


data + GPR[ri] 
CPR[O,rd] « data 


Exceptions: 


Coprocessor Unusable 
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MTHI Move To HI Register 


31 26 25 21 20 6 5 0 
SPECIAL rs 0 MTHI 
000000 0 0000 0000 0000 O00 010001 

6 5 15 6 
Format: MTHI rs MIPS | 


Purpose: To copy a GPR to the special purpose HI register. 


Description: HI < rs 


The contents of GPR rs are loaded into special register HI. 


Restrictions: 


If either of the two preceding instructions is MFHI, the result of that MFHI is undefined. Reads 
of the HI or LO special registers must be separated from subsequent instructions that write to 
them by two or more other instructions. 


A computed result written to the H//LO pair by DDIV, DDIVU, DIV, DIVU, DMULT, 
DMULTU, MULT, or MULTU must be read by MFHI or MFLO before another result is written 
into either HJ or LO. If an MTHI instruction is executed following one of these arithmetic 
instructions, but before a MFLO or MFHI instruction, the contents of LO are undefined. The 
following example shows this illegal situation: 

MUL r2,r4  # start operation that will eventually write to HI,LO 

fs # code not containing mfhi or mflo 

MTHI r6 
oe # code not containing mflo 

MFLO °3 # this mflo would get an undefined value 


Operation: 


l-2:, l-1: HI < undefined 
I: HI + GPRIrs] 


Exceptions: 


None 


Move To LO Register MTLO 


31 26 25 21 20 6 5 0 
SPECIAL rs 0 MTLO 
000000 0 0000 0000 0000 O00 010011 

6 5 15 6 
Format: MTLO rs MIPS | 


Purpose: To copy a GPR to the special purpose LO register. 


Description: LO< rs 


The contents of GPR rs are loaded into special register LO. 


Restrictions: 


If either of the two preceding instructions is MFLO, the result of that MFLO is undefined. 
Reads of the H/ or LO special registers must be separated from subsequent instructions that 
write to them by two or more other instructions. 


A computed result written to the HI/LO pair by DDIV, DDIVU, DIV, DIVU, DMULT, 
DMULTU, MULT, or MULTU must be read by MFHI or MFLO before another result is written 
into either HJ or LO. If an MTLO instruction is executed following one of these arithmetic 
instructions, but before a MFLO or MFHI instruction, the contents of H/ are undefined. The 
following example shows this illegal situation: 

MUL r2,r4  # start operation that will eventually write to HI,LO 

sty # code not containing mfhi or mflo 

MTLO r6 
or # code not containing mfhi 

MFHI r3 # this mfhi would get an undefined value 


Operation: 


I-2:, I-1: LO — undefined 
I: LO «+ GPRIrs] 


Exceptions: 


None 
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MTPC (R10000 only) Move To Performance Counter 
31 26 25 21 20 16 15 1110 6 5 10 
COPO 00100 rt 11001 00000 reg 1 
010000 
6 5 5 5 5 5 1 
Format: MTPC rt, reg 
Description: 


The contents of general register rt are loaded into a performance counter reg of CPO. Only 0 and 
1 are valid for reg in the R10000 implementation. 


Operation: 


data + GPR[ri] 
CPR[O, reg] < data 


Exceptions: 


Coprocessor Unusable 
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Move To Performance Event Specifier (R10000 only) MTPS 
31 26 25 21 20 16 15 1110 6 5 10 
COPO 00100 rt 11001 00000 reg 0 
010000 
6 5 5 5 5 5 1 
Format: MTPS rt, reg 
Description: 


The contents of general register rt are loaded into a performance event specifier reg of CPO. 
Only 0 and 1 are valid for reg in the R10000 implementation. 


Operation: 


data + GPR[ri] 
CPR[O, reg] < data 


Exceptions: 


Coprocessor Unusable 
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MULT Multiply Word 


31 26 25 21 20 16 15 6 5 0 
SPECIAL i tt 0 MULT 
000000 000000 0000 011000 

6 5 5 10 6 
Format: MULT rs, rt MIPS | 


Purpose: To multiply 32-bit signed integers. 


Description: (LO, HI) < rs x rt 


The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both 
operands as signed values, to produce a 64-bit result. The low-order 32-bit word of the result 
is placed into special register LO, and the high-order 32-bit word is placed into special register 
HI. 


No arithmetic exception occurs under any circumstances. 


Restrictions: 


On 64-bit processors, if either GPR 7t or GPR rs do not contain sign-extended 32-bit values 
(bits 63..31 equal), then the result of the operation is undefined. 


If either of the two preceding instructions is MFHI or MFLO, the result of the MFHI or MFLO 
is undefined. Reads of the H/7 or LO special registers must be separated from subsequent 
instructions that write to them by two or more other instructions. 


Operation: 
if (NotWordValue(GPR[rs]) or NotWordValue(GPR[rt])) then UndefinedResult() endif 
I-2:, I-1: LO, HI < undefined 
I: prod <- GPRIrs]31. 0 ig GPRIrt]31.0 
LO < sign_extend(prod3;_ ) 
HI < sign_extend(prodg3, 32) 


Exceptions: 


None 


Programming Notes: 


In some processors the integer multiply operation may proceed asynchronously and allow other 
CPU instructions to execute before it is complete. An attempt to read LO or H/ before the results 
are written will wait (interlock) until the results are ready. Asynchronous execution does not 
affect the program result, but offers an opportunity for performance improvement by scheduling 
the multiply so that other instructions can execute in parallel. 


Programs that require overflow detection must check for it explicitly. 
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Multiply Unsigned Word M U LTU 


31 26 25 21 20 16 15 6 5 0 
SPECIAL 7 rt 0 MULTU 
000000 0000000000 | 011001 

6 5 5 10 6 
Format: MULTU rs, rt MIPS | 


Purpose: To multiply 32-bit unsigned integers. 


Description: (LO, HI) < rs x rt 


The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both 
operands as unsigned values, to produce a 64-bit result. The low-order 32-bit word of the result 
is placed into special register LO, and the high-order 32-bit word is placed into special register 
HI. 


No arithmetic exception occurs under any circumstances. 


Restrictions: 


On 64-bit processors, if either GPR 7t or GPR rs do not contain sign-extended 32-bit values 
(bits 63..31 equal), then the result of the operation is undefined. 


If either of the two preceding instructions is MFHI or MFLO, the result of the MFHI or MFLO 
is undefined. Reads of the H/7 or LO special registers must be separated from subsequent 
instructions that write to them by two or more other instructions. 


Operation: 
if (NotWordValue(GPR[rs]) or NotWordValue(GPRirt])) then UndefinedResult() endif 
I-2:, I-1: LO, HI < undefined 
L prod << (0 || GPRIrs]g1_.0) * (0 || GPR[rtls,..) 
LO < sign_extend(prod3;_) 
HI < sign_extend(prodg3, 32) 


Exceptions: 


None 


Programming Notes: 


In some processors the integer multiply operation may proceed asynchronously and allow other 
CPU instructions to execute before it is complete. An attempt to read LO or H/ before the results 
are written will wait (interlock) until the results are ready. Asynchronous execution does not 
affect the program result, but offers an opportunity for performance improvement by scheduling 
the multiply so that other instructions can execute in parallel. 


Programs that require overflow detection must check for it explicitly. 
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NOR Not Or 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 NOR 
000000 00000 100111 
6 5 5 5 5 6 
Format: NOR rd, rs, rt MIPS | 
Purpose: To do a bitwise logical NOT OR. 


Description: rd <rs NOR rt 


The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical NOR 
operation. The result is placed into GPR rd. 


Restrictions: 


None 


Operation: 
GPR[rd] <— GPR[rs] nor GPR[rt] 


Exceptions: 


None 
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Or OR 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 OR 
000000 00000 | 100101 
6 5 5 5 5 6 
Format: OR rd, rs, rt MIPS | 
Purpose: To do a bitwise logical OR. 


Description: rd<rs OR rt 


The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical OR 
operation. The result is placed into GPR rd. 


Restrictions: 


None 


Operation: 
GPR{[rd] — GPR[rs] or GPR[rt] 


Exceptions: 


None 
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Or Immediate 


31 26 25 21 20 16 15 0 
ORI rs rt immediate 
001101 
6 5 5 16 
Format: ORI rt, rs, immediate MIPS | 
Purpose: To do a bitwise logical OR with a constant. 


Description: rd < rs OR immediate 


The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in 
a bitwise logical OR operation. The result is placed into GPR rt. 


Restrictions: 


None 


Operation: 
GPRi[rt] « zero_extend(immediate) or GPR[rs] 


Exceptions: 


None 
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Prefetch (R10000 only) PREF 


31 26 25 21 20 16 15 0 
ree base hint offset 
110011 
6 5 5 16 
Format: PREF hint, offset(base) MIPS IV 
Purpose: To prefetch data from memory. 


Description: prefetch_memory(base-+offset) 


PREF adds the 16-bit signed offset to the contents of GPR base to form an effective byte 
address. It advises that data at the effective address may be used in the near future. The hint 
field supplies information about the way that the data is expected to be used. 


PREF is an advisory instruction. It may change the performance of the program. For all hint 
values and all effective addresses, it neither changes architecturally-visible state nor alters the 
meaning of the program. An implementation may do nothing when executing a PREF 
instruction. 


If MIPS IV instructions are supported and enabled, PREF does not cause addressing-related 
exceptions. If it raises an exception condition, the exception condition is ignored. If an 
addressing-related exception condition is raised and ignored, no data will be prefetched, Even 
if no data is prefetched in such a case, some action that is not architecturally-visible, such as 
writeback of a dirty cache line, might take place. 


PREF will never generate a memory operation for a location with an uncached memory access 
type (see 1.6 Memory Access Types). 


If PREF results in a memory operation, the memory access type used for the operation is 
determined by the memory access type of the effective address, just as it would be if the memory 
operation had been caused by a load or store to the effective address. 


PREF enables the processor to take some action, typically prefetching the data into cache, to 
improve program performance. The action taken for a specific PREF instruction is both system 
and context dependent. Any action, including doing nothing, is permitted that does not change 
architecturally-visible state or alter the meaning of a program. It is expected that 
implementations will either do nothing or take an action that will increase the performance of 
the program. 


For a cached location, the expected, and useful, action is for the processor to prefetch a block 
of data that includes the effective address. The size of the block, and the level of the memory 
hierarchy it is fetched into are implementation specific. 
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PREF (R10000 only) Prefetch 


The hint field supplies information about the way the data is expected to be used. No hint value 
causes an action that modifies architecturally-visible state. A processor may use a hint value to 
improve the effectiveness of the prefetch action. The defined hint values and the recommended 
prefetch action are shown in the table below. The hint table may be extended in future 
implementations. 


Table 1-37 Values of Hint Field for Prefetch Instruction 


Value Name Data use and desired prefetch action 


0 load Data is expected to be loaded (not modified). 
Fetch data as if for a load. 


1 store Data is expected to be stored or modified. 
Fetch data as if for a store. 


2-3 Not yet defined. 


4 load_streamed Data is expected to be loaded (not modified) but not reused 
extensively; it will “stream” through cache. 

Fetch data as if for a load and place it in the cache so that it 
will not displace data prefetched as “retained”. 


3 store_streamed Data is expected to be stored or modified but not reused exten- 
sively; it will “stream” through cache. 

Fetch data as if for a store and place it in the cache so that it 
will not displace data prefetched as “retained”. 


6 load_retained Data is expected to be loaded (not modified) and reused exten- 
sively; it should be “retained” in the cache. 

Fetch data as if for a load and place it in the cache so that it 
will not be displaced by data prefetched as “streamed”. 


7 store_retained Data is expected to be stored or modified and reused exten- 
sively; it should be “retained” in the cache. 

Fetch data as if for a store and place it in the cache so that will 
not be displaced by data prefetched as “streamed”’. 


8-31 Not yet defined. 


Restrictions: 
None 


Operation: 


vAddr < GPR[base] + sign_extend(offset) 
(pAddr, uncached) < AddressTranslation(vAddr, DATA, LOAD) 
Prefetch(uncached, pAddr, vAddr, DATA, hint) 


Exceptions: 


Reserved Instruction 
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(R10000 only) PREF 


Prefetch 


Programming Notes: 
Prefetch can not prefetch data from a mapped location unless the translation for that location is 
present in the TLB. Locations in memory pages that have not been accessed recently may not 
have translations in the TLB, so prefetch may not be effective for such locations. 


Prefetch does not cause addressing exceptions. It will not cause an exception to prefetch using 
an address pointer value before the validity of a pointer is determined. 


Implementation Notes: 


It is recommended that a reserved hint field value either cause a default prefetch action that is 
expected to be useful for most cases of data use, such as the “load” hint, or cause the instruction 


to be treated as a NOP. 
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S B Store Byte 


31 26 25 21 20 16 15 0 
SB base rt offset 
101000 
6 5 5 16 
Format: SB rt, offset(base) MIPS I 
Purpose: To store a byte to memory. 


Description: memory[base+offset] < rt 


The least-significant 8-bit byte of GPR rt is stored in memory at the location specified by the 
effective address. The 16-bit signed offset is added to the contents of GPR base to form the 
effective address. 


Restrictions: 


None 


Operation: 32-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr <— pAddrpgjze-1. 2 || (PAddr, o xor ReverseEndian?) 

byte < vAddr; 9 xor BigEndianCPU? 

dataword < GPRIrt]31~8+byte..o || 08 >" 

StoreMemory (uncached, BYTE, dataword, pAddr, vAddr, DATA) 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr <— pAddrpgjze-1.3 || (PAddrs o xor ReverseEndian®) 

byte < vAddry 9 xor BigEndianCPU® 

datadouble — GPRI[rt}g3_g*pyte..o || 08 >Y° 

StoreMemory (uncached, BYTE, datadouble, pAddr, vAddr, DATA) 


Exceptions: 


TLB Refill, TLB Invalid 
TLB Modified 

Bus Error 

Address Error 
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Store Conditional Word SC 
31 26 25 21 20 16 15 0 
SC base rt offset 
111000 
6 5 5 16 
Format: SC rt, offset(base) MIPS Il 
Purpose: To store a word to memory to complete an atomic read-modify-write. 


Description: if (atomic_update) then memory[base+offset] < rt, rt< 1 else rt <0 


The LL and SC instructions provide primitives to implement atomic Read-Modify-Write 
(RMW) operations for cached memory locations. 


The 16-bit signed offset is added to the contents of GPR base to form an effective address. 


The SC completes the RMW sequence begun by the preceding LL instruction executed on the 
processor. If it would complete the RMW sequence atomically, then the least-significant 32-bit 
word of GPR rt is stored into memory at the location specified by the aligned effective address 
and a one, indicating success, is written into GPR rt. Otherwise, memory is not modified and a 
zero, indicating failure, is written into GPR rt. 


If any of the following events occurs between the execution of LL and SC, the SC will fail: 


¢ Accoherent store is completed by another processor or coherent I/O module into the 
block of physical memory containing the word. The size and alignment of the block is 
implementation dependent. It is at least one word and is at most the minimum page 
size. 


e An exception occurs on the processor executing the LL/SC. 
An implementation may detect ‘“‘an exception” in one of three ways: 
1) Detect exceptions and fail when an exception occurs. 
2) Fail after the return-from-interrupt instruction (RFE or ERET) is executed. 
3) Do both 1 and 2. 


If any of the following events occurs between the execution of LL and SC, the SC may succeed 
or it may fail; the success or failure is unpredictable. Portable programs should not cause one 
of these events. 


¢ A load, store, or prefetch is executed on the processor executing the LL/SC. 


¢ The instructions executed starting with the LL and ending with the SC do not lie in a 
2048-byte contiguous region of virtual memory. The region does not have to be 
aligned, other than the alignment required for instruction words. 


The following conditions must be true or the result of the SC will be undefined: 


¢ Execution of SC must have been preceded by execution of an LL instruction. 
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SC Store Conditional Word 


e ARMW sequence executed without intervening exceptions must use the same address 
in the LL and SC. The address is the same if the virtual address, physical address, and 
cache-coherence algorithm are identical. 


Atomic RMW is provided only for cached memory locations. The extent to which the detection 
of atomicity operates correctly depends on the system implementation and the memory access 
type used for the location. See 1.6 Memory Access Types. 


MP atomicity: To provide atomic RMW among multiple processors, all accesses to the 
location must be made with a memory access type of cached coherent. 


Uniprocessor atomicity: To provide atomic RMW ona single processor, all accesses to the 
location must be made with memory access type of either cached noncoherent or cached 
coherent. All accesses must be to one or the other access type, they may not be mixed. 


I/O System: To provide atomic RMW with a coherent I/O system, all accesses to the location 
must be made with a memory access type of cached coherent. If the I/O system does not use 
coherent memory operations, then atomic RMW cannot be provided with respect to the I/O 
reads and writes. 


The definition above applies to user-mode operation on all MIPS processors that support the 
MIPS II architecture. There may be other implementation-specific events, such as privileged 
CPO instructions, that will cause an SC instruction to fail in some cases. System programmers 
using LL/SC should consult implementation-specific documentation. 


Restrictions: 


The addressed location must have a memory access type of cached noncoherent or cached 
coherent; if it does not, the result is undefined (see 1.6 Memory Access Types). 


The effective address must be naturally aligned. If either of the two least-significant bits of the 
address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 32-bit processors 


vAddr < sign_extend(offset) + GPR[base] 
if (vAddry_9) # 07 then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
dataword + GPRi{rt] 
if LLbit then 
StoreMemory (uncached, WORD, dataword, pAddr, vAddr, DATA) 
endif 
GPR[rt] — 0°" || LLbit 
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Store Conditional Word SC 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddr; 0) # 07 then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr < pAddrpgjze-1..3 || (PAddrs g xor (ReverseEndian || 0°) 
byte + vAddro 9 xor (BigEndianCPU || 07) 

datadouble — GPRIrtle3-8byte..o || 08 >Y"* 


if LLbit then 
StoreMemory (uncached, WORD, datadouble, pAddr, vAddr, DATA) 
endif 
GPR[rt] — 0°? || LLbit 
Exceptions: 


TLB Refill, TLB Invalid 
TLB Modified 
Address Error 


Reserved Instruction 


Programming Notes: 


LL and SC are used to atomically update memory locations as shown in the example atomic 
increment operation below. 


L1: 
LL T1, (TO) # load counter 
ADDI T2, T1, 1 # increment 
SC T2, (T0) # try to store, checking for atomicity 
BEQ T2,0,L1  # if not atomic (0), try again 
NOP # branch-delay slot 


Exceptions between the LL and SC cause SC to fail, so persistent exceptions must be avoided. 
Some examples of these are arithmetic operations that trap, system calls, floating-point 
operations that trap or require software emulation assistance. 


LL and SC function on a single processor for cached noncoherent memory so that parallel 
programs can be run on uniprocessor systems that do not support cached coherent memory 
access types. 


Implementation Notes: 


The block of memory that is “locked” for LL/SC is typically the largest cache line in use. 


153 


154 


Chapter 1 CPU Instruction Set 


SC D Store Conditional Doubleword 
31 26 25 21 20 16 15 0 
CD base rt offset 
111100 
6 5 5 16 
Format: SCD rt, offset(base) MIPS III 
Purpose: To store a doubleword to memory to complete an atomic read-modify-write. 


Description: if (atomic_update) then memory[base+offset] < rt, rt< 1 else rt <0 
The 16-bit signed offset is added to the contents of GPR base to form an effective address. 


The SCD completes the RMW sequence begun by the preceding LLD instruction executed on 
the processor. If it would complete the RMW sequence atomically, then the 64-bit doubleword 
of GPR rt is stored into memory at the location specified by the aligned effective address and a 
one, indicating success, is written into GPR rt. Otherwise, memory is not modified and a zero, 
indicating failure, is written into GPR rt. 


If any of the following events occurs between the execution of LLD and SCD, the SCD will fail: 


¢ Accoherent store is completed by another processor or coherent I/O module into the 
block of physical memory containing the word. The size and alignment of the block is 
implementation dependent. It is at least one doubleword and is at most the minimum 
page size. 

e An exception occurs on the processor executing the LLD/SCD. 
An implementation may detect “an exception” in one of three ways: 
1) Detect exceptions and fail when an exception occurs. 
2) Fail after the return-from-interrupt instruction (RFE or ERET) is executed. 
3) Do both 1 and 2. 


If any of the following events occurs between the execution of LLD and SCD, the SCD may 
succeed or it may fail; the success or failure is unpredictable. Portable programs should not 
cause one of these events. 


e A memory access instruction (load, store, or prefetch) is executed on the processor 
executing the LLD/SCD. 


¢ The instructions executed starting with the LLD and ending with the SCD do not lie in 
a 2048-byte contiguous region of virtual memory. The region does not have to be 
aligned, other than the alignment required for instruction words. 

The following conditions must be true or the result of the SCD will be undefined: 


¢ Execution of SCD must have been preceded by execution of an LLD instruction. 
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Store Conditional Doubleword SCD 


e ARMW sequence executed without intervening exceptions must use the same address 
in the LLD and SCD. The address is the same if the virtual address, physical address, 
and cache-coherence algorithm are identical. 


Atomic RMW is provided only for memory locations with cached noncoherent or cached 
coherent memory access types. The extent to which the detection of atomicity operates 
correctly depends on the system implementation and the memory access type used for the 
location. See 1.6 Memory Access Types. 


MP atomicity: To provide atomic RMW among multiple processors, all accesses to the 
location must be made with a memory access type of cached coherent. 


Uniprocessor atomicity: To provide atomic RMW on a single processor, all accesses to the 
location must be made with memory access type of either cached noncoherent or cached 
coherent. All accesses must be to one or the other access type, they may not be mixed. 


I/O System: To provide atomic RMW with a coherent I/O system, all accesses to the location 
must be made with a memory access type of cached coherent. If the I/O system does not use 
coherent memory operations, then atomic RMW cannot be provided with respect to the I/O 
reads and writes. 


The defemination above applies to user-mode operation on all MIPS processors that support the 
MIPS II architecture. There may be other implementation-specific events, such as privileged 
CPO instructions, that will cause an SCD instruction to fail in some cases. System programmers 
using LLD/SCD should consult implementation-specific documentation. 


Restrictions: 


The addressed location must have a memory access type of cached noncoherent or cached 
coherent; if it does not, the result is undefined (see 1.6 Memory Access Types). The 64-bit 
doubleword of register rt is conditionally stored in memory at the location specified by the 
aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form 
the effective address. 


The effective address must be naturally aligned. If any of the three least-significant bits of the 
address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 3 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 
if (vAddro 0) # 03 then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
datadouble < GPR[rt] 
if LLbit then 
StoreMemory (uncached, DOUBLEWORD, datadouble, pAddr, vAddr, DATA) 
endif 
GPRIrt] — 0°? || LLbit 
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SC D Store Conditional Doubleword 


Exceptions: 
TLB Refill, TLB Invalid 
TLB Modified 
Address Error 


Reserved Instruction 


Programming Notes: 


LLD and SCD are used to atomically update memory locations as shown in the example atomic 
increment operation below. 


L1: 
LLD T1, (T0) # load counter 
ADDI T2,71,1  #increment 
SCD T2, (TO) # try to store, checking for atomicity 
BEQ T2,0,L1  # if not atomic (0), try again 
NOP # branch-delay slot 


Exceptions between the LLD and SCD cause SCD to fail, so persistent exceptions must be 
avoided. Some examples of these are arithmetic operations that trap, system calls, floating- 
point operations that trap or require software emulation assistance. 


LLD and SCD function on a single processor for cached noncoherent memory so that parallel 
programs can be run on uniprocessor systems that do not support cached coherent memory 
access types. 


Implementation Notes: 
The block of memory that is “locked” for LLD/SCD is typically the largest cache line in use. 


Store Doubleword 


sD 


31 26 25 21 20 16 15 0 
SD base rt offset 
111111 
6 5 5 16 
Format: SD ft, offset(base) MIPS Ill 
Purpose: To store a doubleword to memory. 


Description: memory[base+offset] < rt 


The 64-bit doubleword in GPR rt is stored in memory at the location specified by the aligned 
effective address. The 16-bit signed offset is added to the contents of GPR base to form the 


effective address. 


Restrictions: 


The effective address must be naturally aligned. If any of the three least-significant bits of the 
effective address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 3 bits of the offset field must be zero. If they are not, the result of the 


instruction is undefined. 


Operation: 64-bit processors 
vAddr < sign_extend(offset) + GPR[base] 


if (vAddro 9) # 03 then SignalException(AddressError) endif 


(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 


datadouble < GPR[rt] 


StoreMemory (uncached, DOUBLEWORD, datadouble, pAddr, vAddr, DATA) 


Exceptions: 


TLB Refill, TLB Invalid 
TLB Modified 

Address Error 

Reserved Instruction 
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SDCz Store Doubleword From Coprocessor 
31 26 25 21 20 16 15 0 
SDCz base rt offset 
11112z 
6 5 5 16 
Format: SDC1_ rt, offset(base) MIPS Il 


SDC2 rt, offset(base) 
Purpose: To store a doubleword from a coprocessor general register to memory. 


Description: memory[base+offset] < rt 


Coprocessor unit zz supplies a 64-bit doubleword which is stored at the memory location 
specified by the aligned effective address. The 16-bit signed offset is added to the contents of 
GPR base to form the effective address. 


The data supplied by each coprocessor is defined by the individual coprocessor specifications. 
The usual operation would read the data from coprocessor general register rt. 


Each MIPS architecture level defines up to 4 coprocessor units, numbered 0 to 3 (see 
1.2.5 Coprocessor Instructions). The opcodes corresponding to coprocessors that are not 
defined by an architecture level may be used for other instructions. 


Restrictions: 


Access to the coprocessors is controlled by system software. Each coprocessor has a 
“coprocessor usable” bit in the System Control coprocessor. The usable bit must be set for a 
user program to execute a coprocessor instruction. If the usable bit is not set, an attempt to 
execute the instruction will result in a Coprocessor Unusable exception. An unimplemented 
coprocessor must never be enabled. The result of executing this instruction for an 
unimplemented coprocessor when the usable bit is set, is undefined. 


This instruction is not defined for coprocessor 0, the System Control coprocessor, and the 
opcode may be used for other instructions. 


The effective address must be naturally aligned. If any of the three least-significant bits of the 
effective address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 3 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 32-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddro 0) # 03 then SignalException(AddressError) endif 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
datadouble < COP_SD(z, rt) 

StoreMemory (uncached, DOUBLEWORD, datadouble, pAddr, vAddr, DATA) 


Store Doubleword From Coprocessor SDCz 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddro 9) # 03 then SignalException(AddressError) endif 

(pAddr, uncached) « AddressTranslation (vAddr, DATA, STORE) 
datadouble < COP_SD(z, rt) 

StoreMemory (uncached, DOUBLEWORD, datadouble, pAddr, vAddr, DATA) 


Exceptions: 


TLB Refill, TLB Invalid 
TLB Modified 

Address Error 

Reserved Instruction 
Coprocessor Unusable 
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SD L Store Doubleword Left 
31 26 25 21 20 16 15 0 
SDL base rt offset 
101100 
6 5 5 16 
Format: SDL rt, offset(base) MIPS Ill 
Purpose: To store the most-significant part of a doubleword to an unaligned memory 
address. 


Description: memory[base+offset] < Some_Bytes_From rt 


The 16-bit signed offset is added to the contents of GPR base to form an effective address 
(EffAddr). EffAddr is the address of the most-significant of eight consecutive bytes forming a 
doubleword in memory (DW) starting at an arbitrary byte boundary. A part of DW, the most- 
significant one to eight bytes, is in the aligned doubleword containing EffAddr. The same 
number of most-significant (left) bytes of GPR rt are stored into these bytes of DW. 


The figure below illustrates this operation for big-endian byte ordering. The eight consecutive 
bytes in 2..9 form an unaligned doubleword starting at location 2. A part of DW, six bytes, is 
contained in the aligned doubleword containing the most-significant byte at 2. First, SDL stores 
the six most-significant bytes of the source register into these bytes in memory. Next, the 
complementary SDR instruction stores the remainder of DW. 


Doubleword at byte 2 in memory (big-endian) - each memory byte contains its address 


most —significance— _ least 
of/1/2]3]/4[5]6| 7] 8] 9 | 10] 11] 12] 13] 14/15 Memory 


G|H| GPR 24 


After executing 
0; 1 SDL $24,2($0) 
Then after 
Oo] 1 eS SDR $24,9($0) 
7 


Figure 1-6 Unaligned Doubleword Store with SDL and SDR 
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Store Doubleword Left S D L 


The bytes stored from the source register to memory depend on both the offset of the effective 
address within an aligned doubleword, i.e. the low three bits of the address (vAddr> 9), and the 
current byte ordering mode of the processor (big- or little-endian). The table below shows the 
bytes stored for every combination of offset and byte ordering. 


Table 1-38 Bytes Stored by SDL Instruction 


Initial Memory contents and byte offsets Contents of 

most —significance— least Source Register 
012 3 4 5 6 7 €&big- most —significance— least 
ifj]k}] tim] nfo]p A|B|C|D/E|F/G|H 


7 6 5 4 3 2 1 =O €'iittle-endian 
Memory contents after instruction (shaded is unchanged) 


Big-endian byte ordering vAddro 9 __Little-endian byte ordering 


A BC DE F GH 0 i k | min of A 
ijA BCD EF G 1 i k | m njA B 
i A BcCODE F 2 i k | mjA BC 
i k/A B CODE 3 i k |/A BC OD 
i k |/A B C D 4 i k/A B CODE 
i k | mj/A BC 5 i A BcCODE F 
i k | m nl{|A B 6 ijA BCD E F G 
i k | min ofA 7 A BC DE F GH 
Restrictions: 


None 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr <— pAddrpgize-1)..3 || (PAddrs 9 xor ReverseEndian®) 
If BigEndianMem = 0 then 
pAddr <— pAddrpgize-t)..s || 0° 
endif 
byte — vAddrs 9 xor BigEndianCPU? 
datadouble <— 9°6 S"byte || GPRIrt]e3, 56-8*byte 
StoreMemory (uncached, byte, datadouble, pAddr, vAddr, DATA) 


Exceptions: 


TLB Refill, TLB Invalid 
TLB Modified 

Bus Error 

Address Error 

Reserved Instruction 
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SD R Store Doubleword Right 
31 26 25 21 20 16 15 0 
SDR base rt offset 
101101 
6 5 5 16 
Format: SDR rt, offset(base) MIPS Ill 
Purpose: To store the least-significant part of a doubleword to an unaligned memory 
address. 


Description: memory[base+offset] < Some_Bytes_From rt 


The 16-bit signed offset is added to the contents of GPR base to form an effective address 
(EffAddr). EffAddr is the address of the least-significant of eight consecutive bytes forming a 
doubleword in memory (DW) starting at an arbitrary byte boundary. A part of DW, the least- 
significant one to eight bytes, is in the aligned doubleword containing EffAddr. The same 
number of least-significant (right) bytes of GPR rt are stored into these bytes of DW. 


The figure below illustrates this operation for big-endian byte ordering. The eight consecutive 
bytes in 2..9 form an unaligned doubleword starting at location 2. A part of DW, two bytes, is 
contained in the aligned doubleword containing the least-significant byte at 9. First, SDR stores 
the two least-significant bytes of the source register into these bytes in memory. Next, the 
complementary SDL stores the remainder of DW. 


Doubleword at byte 2 in memory, big-endian byte order, - each mem byte contains its address 


most  —significance— _ least 
of/1/2]3]/4[5]6| 7] 8] 9 | 10] 11] 12] 13] 14/15 Memory 


A;|}B|C|D|E|F/|}G/H| GPR 24 


After executing 
0o/1/2/3/4]/5/6|7]7G/H}10 ay SDR $24,9($0) 
¥ 


Then after 
O0o;/1;A/B;C|;D/E|]FIG{|H/10}... SDL $24,2($0) 


Figure 1-7 Unaligned Doubleword Store with SDR and SDL 


NN, 
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Store Doubleword Right 


SDR 


The bytes stored from the source register to memory depend on both the offset of the effective 
address within an aligned doubleword, i.e. the low three bits of the address (vAddr> ¢), and the 
current byte ordering mode of the processor (big- or little-endian). The table below shows the 


bytes stored for every combination of offset and byte ordering. 


Table 1-39 Bytes Stored by SDR Instruction 


Initial Memory contents and byte offsets Contents of 
most —significance —least Source Register 


012 3 4 5 6 7 €big- most —significance —least 


ijj|k]t}]m|njojp A|B;/C;/D|E/;]F|G|H 
7 6 5 4 3 2 1 = O €little-endian 

Memory contents after instruction (shaded is unchanged) 

Big-endian byte ordering vAddro 9 Little-endian byte ordering 
H;j k | m no p 0 ABC DE F GH 
G Hik | m no p 1 B C DE F G Hip 
F G H]|!I m no p 2 C DE F G Hjo p 
E F G Him no p 3 D EF G Hin o p 
D EF G H{n o p 4 E F G Him no p 
C DE F G Hjo p 5 F G H/]!I m no p 
B C DE F G Hip 6 G H/ik | m no p 
A BC DE F GH 7 H|j k | m no p 


Restrictions: 


None 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr <— pAddrpgize-1)..3 || (PAddrs 9 xor ReverseEndian®) 
If BigEndianMem = 0 then 
pAddr — pAddrpsize-t)..s || 0° 
endif 
byte — vAddr_9 xor BigEndianCPU? 
datadouble — GPRI[rtlg3_g+byte || 08 >Y"* 


StoreMemory (uncached, DOUBLEWORD-byte, datadouble, pAddr, vAddr, DATA) 


Exceptions: 


TLB Refill, TLB Invalid 
TLB Modified 

Bus Error 

Address Error 

Reserved Instruction 
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SH Store Halfword 
31 26 25 21 20 16 15 0 
SH base rt offset 
101001 
6 5 5 16 
Format: SH rt, offset(base) MIPS I 
Purpose: To store a halfword to memory. 


Description: memory[base+offset] < rt 


The least-significant 16-bit halfword of register rt is stored in memory at the location specified 
by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base 
to form the effective address. 


Restrictions: 


The effective address must be naturally aligned. If the least-significant bit of the address is non- 
zero, an Address Error exception occurs. 


MIPS IV: The low-order bit of the offset field must be zero. If it is not, the result of the 
instruction is undefined. 


Operation: 32-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddrg) # 0 then SignalException(AddressError) endif 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr — pAddrpgjze-1..2 || (pAddr; 9 xor (ReverseEndian || 0)) 

byte < vAddr; 9 xor (BigEndianCPU || 0) 

dataword < GPRIrt]31~+byte..o || 08 >" 

StoreMemory (uncached, HALFWORD, dataword, pAddr, vAddr, DATA) 


Operation: 64-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddrg) # 0 then SignalException(AddressError) endif 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 

pAddr <— pAddrpgjze-1.3 || (PAddrs o xor (ReverseEndian? || 0)) 

byte — vAddro 9 xor (BigEndianCPU? || 0) 

datadouble — GPRIrtlg3_g*pyte..o || 08 >Y*° 

StoreMemory (uncached, HALFWORD, datadouble, pAddr, vAddr, DATA) 


Exceptions: 


TLB Refill, TLB Invalid 
TLB Modified 
Address Error 
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Shift Word Left Logical SLL 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa SLL 
000000 00000 000000 
6 5 5 5 5 6 

Format: SLL rd, rt, sa MIPS | 


Purpose: To left shift a word by a fixed number of bits. 


Description: rd<rt<<sa 


The contents of the low-order 32-bit word of GPR rt are shifted left, inserting zeroes into the 
emptied bits; the word result is placed in GPR rd. The bit shift count is specified by sa. If rd 
is a 64-bit register, the result word is sign-extended. 


Restrictions: 


None 


Operation: 


s < sa 
temp << GPRIt](31-s)..0 || 0s 
GPR[rd]< sign_extend(temp) 


Exceptions: 


None 


Programming Notes: 


Unlike nearly all other word operations the input operand does not have to be a properly sign- 
extended word value to produce a valid sign-extended 32-bit result. The result word is always 
sign extended into a 64-bit destination register; this instruction with a zero shift amount 
truncates a 64-bit value to 32 bits and sign extends it. 


Some assemblers, particularly 32-bit assemblers, treat this instruction with a shift amount of 
zero as a NOP and either delete it or replace it with an actual NOP. 
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SLLV Shift Word Left Logical Variable 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SLLV 
000000 00000 000100 
6 5 5 5 5 6 
Format: SLLV rd, rt, rs MIPS | 

Purpose: To left shift a word by a variable number of bits. 


Description: rd <rt<<rs 


The contents of the low-order 32-bit word of GPR rt are shifted left, inserting zeroes into the 

emptied bits; the result word is placed in GPR rd. The bit shift count is specified by the low- 

order five bits of GPR rs. If rd is a 64-bit register, the result word is sign-extended. 
Restrictions: 


None 


Operation: 


Ss < GPIrs]q4_0 
temp << GPRIt](31-s)..0 || 0s 
GPR[rd]< sign_extend(temp) 


Exceptions: 


None 


Programming Notes: 


Unlike nearly all other word operations the input operand does not have to be a properly sign- 
extended word value to produce a valid sign-extended 32-bit result. The result word is always 
sign extended into a 64-bit destination register; this instruction with a zero shift amount 
truncates a 64-bit value to 32 bits and sign extends it. 


Some assemblers, particularly 32-bit assemblers, treat this instruction with a shift amount of 
zero as a NOP and either delete it or replace it with an actual NOP. 
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Set On Less Than S LT 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SLT 
000000 00000 101010 
6 5 5 5 5 6 
Format: SLT rd, rs, rt MIPS | 
Purpose: To record the result of a less-than comparison. 


Description: rd < (rs < rt) 


Compare the contents of GPR rs and GPR rt as signed integers and record the Boolean result of 
the comparison in GPR rd. If GPR rs is less than GPR rt the result is 1 (true), otherwise 0 
(false). 


The arithmetic comparison does not cause an Integer Overflow exception. 
Restrictions: 
None 


Operation: 


if GPR[rs] < GPR[rt] then 
GPRird] <— OGPRLEN |) 4 
else 
GPR{[rd] — OGPRLEN 
endif 


Exceptions: 


None 
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S LTl Set on Less Than Immediate 
31 26 25 21 20 16 15 0 
SLTI rs rt immediate 
001010 
6 5 5 16 
Format: SLTI rt, rs, immediate MIPS | 
Purpose: To record the result of a less-than comparison with a constant. 


Description: rt < (rs < immediate) 


Compare the contents of GPR rs and the 16-bit signed immediate as signed integers and record 
the Boolean result of the comparison in GPR rt. If GPR rs is less than immediate the result is 
1 (true), otherwise 0 (false). 


The arithmetic comparison does not cause an Integer Overflow exception. 
Restrictions: 
None 


Operation: 


if GPR[rs] < sign_extend(immediate) then 
GPRird] <— OGPRLEN)) 4 

else 
GPR[rd] — oGPRLEN 

endif 


Exceptions: 


None 
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Set on Less Than Immediate Unsigned SLTIU 
31 26 25 21 20 16 15 0 
SLTIU rs rt immediate 
001011 
6 5 5 16 
Format: SLTIU rt, rs, immediate MIPS I 
Purpose: To record the result of an unsigned less-than comparison with a constant. 


Description: rt < (rs < immediate) 


Compare the contents of GPR rs and the sign-extended 16-bit immediate as unsigned integers 
and record the Boolean result of the comparison in GPR rt. If GPR rs is less than immediate 
the result is 1 (true), otherwise 0 (false). 


Because the 16-bit immediate is sign-extended before comparison, the instruction is able to 
represent the smallest or largest unsigned numbers. The representable values are at the 
minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned 
range. 


The arithmetic comparison does not cause an Integer Overflow exception. 


Restrictions: 


None 


Operation: 
if (0 || GPRirs]) < (0 || sign_extend(immediate)) then 
GPRird] <— OGPRLEN |) 4 
else 
GPR{[rd] — OGPRLEN 
endif 


Exceptions: 


None 
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SLTU Set on Less Than Unsigned 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SLTU 
000000 00000 101011 
6 5 5 5 5 6 
Format: SLTU_ rd, rs, rt MIPS | 

Purpose: To record the result of an unsigned less-than comparison. 


Description: rd < (rs < rt) 


Compare the contents of GPR rs and GPR rt as unsigned integers and record the Boolean result 
of the comparison in GPR rd. If GPR rs is less than GPR rt the result is 1 (true), otherwise 0 
(false). 


The arithmetic comparison does not cause an Integer Overflow exception. 
Restrictions: 
None 
Operation: 
if (0 || GPR[rs]) < (0 || GPR[rt]) then 
GPRird] <— OGPRLEN |) 4 
else 


GPR{[rd] — OGPRLEN 
endif 


Exceptions: 


None 
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Shift Word Right Arithmetic S RA 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa SRA 
000000 00000 000011 
6 5 5 5 5 6 
Format: SRA rd, rt, sa MIPS | 

Purpose: To arithmetic right shift a word by a fixed number of bits. 


Description: rd<rt>>sa (arithmetic) 


The contents of the low-order 32-bit word of GPR rt are shifted right, duplicating the sign-bit 
(bit 31) in the emptied bits; the word result is placed in GPR rd. The bit shift count is specified 
by sa. If rd is a 64-bit register, the result word is sign-extended. 


Restrictions: 
On 64-bit processors, if GPR rt does not contain a sign-extended 32-bit value (bits 63..31 equal) 
then the result of the operation is undefined. 
Operation: 
if (NotWordValue(GPR[rt])) then UndefinedResult() endif 
Ss < Sa 
temp << (GPR[rt]31)° || GPRI[rt]31..5 
GPR[rd]< sign_extend(temp) 
Exceptions: 


None 
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S RAV Shift Word Right Arithmetic Variable 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SRAV 
000000 00000 000111 
6 5 5 5 5 6 
Format: SRAV rd, rt, rs MIPS I 

Purpose: To arithmetic right shift a word by a variable number of bits. 


Description: rd< rt>>rs (arithmetic) 


The contents of the low-order 32-bit word of GPR rt are shifted right, duplicating the sign-bit 
(bit 31) in the emptied bits; the word result is placed in GPR rd. The bit shift count is specified 
by the low-order five bits of GPR rs. If rd is a 64-bit register, the result word is sign-extended. 


Restrictions: 


On 64-bit processors, if GPR rt does not contain a sign-extended 32-bit value (bits 63..31 equal) 
then the result of the operation is undefined. 
Operation: 
if (NotWordValue(GPR[rt])) then UndefinedResult() endif 
s < GPRIrs]q._0 
temp << (GPR[rt]31)° || GPRIrt]31_.5 
GPR[rd]< sign_extend(temp) 
Exceptions: 


None 
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Shift Word Right Logical SRL 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa SRL 
000000 00000 000010 
6 5 5 5 5 6 

Format: SRL rd, rt, sa MIPS | 


Purpose: To logical right shift a word by a fixed number of bits. 


Description: rd<rt>>sa_ (logical) 


The contents of the low-order 32-bit word of GPR rt are shifted right, inserting zeros into the 
emptied bits; the word result is placed in GPR rd. The bit shift count is specified by sa. If rd 
is a 64-bit register, the result word is sign-extended. 


Restrictions: 
On 64-bit processors, if GPR rt does not contain a sign-extended 32-bit value (bits 63..31 equal) 
then the result of the operation is undefined. 
Operation: 
if (NotWordValue(GPR[rt])) then UndefinedResult() endif 
Ss < Sa 
temp <0%|| GPRirt]3;.. 
GPR[rd]< sign_extend(temp) 
Exceptions: 


None 
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S R LV Shift Word Right Logical Variable 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SRLV 
000000 00000 000110 
6 5 5 5 5 6 
Format: SRLV rd, rt, rs MIPS I 

Purpose: To logical right shift a word by a variable number of bits. 


Description: rd<rt>>rs_ (logical) 


The contents of the low-order 32-bit word of GPR rt are shifted right, inserting zeros into the 
emptied bits; the word result is placed in GPR rd. The bit shift count is specified by the low- 
order five bits of GPR rs. If rd is a 64-bit register, the result word is sign-extended. 


Restrictions: 
On 64-bit processors, if GPR rt does not contain a sign-extended 32-bit value (bits 63..31 equal) 
then the result of the operation is undefined. 
Operation: 
if (NotWordValue(GPR[rt])) then UndefinedResult() endif 
s < GPRIrs]q._0 
temp <0%|| GPRIrt]3;.. 
GPR[rd]< sign_extend(temp) 
Exceptions: 


None 
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Subtract Word S U B 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SUB 
000000 00000 100010 
6 5 5 5 5 6 
Format: SUB rd, rs, rt MIPS | 
Purpose: To subtract 32-bit integers. If overflow occurs, then trap. 


Description: rd<rs- rt 


The 32-bit word value in GPR rt is subtracted from the 32-bit value in GPR rs to produce a 
32-bit result. If the subtraction results in 32-bit 2’s complement arithmetic overflow then the 
destination register is not modified and an Integer Overflow exception occurs. If it does not 
overflow, the 32-bit result is placed into GPR rd. 


Restrictions: 
On 64-bit processors, if either GPR 7t or GPR rs do not contain sign-extended 32-bit values 
(bits 63..31 equal), then the result of the operation is undefined. 


Operation: 


if (NotWordValue(GPR[rs]) or NotWordValue(GPRirt])) then UndefinedResult() endif 
temp < GPRI[rs] - GPR[rt] 
if (82_bit_arithmetic_overflow) then 
SignalException(IntegerOverflow) 
else 
GPRi{rd] <-temp 
endif 


Exceptions: 
Integer Overflow 
Programming Notes: 


SUBU performs the same arithmetic operation but, does not trap on overflow. 
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SU B U Subtract Unsigned Word 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SUBU 

000000 00000 100011 
6 5 5 5 5 6 
Format: SUBU rd, rs, rt MIPS | 
Purpose: To subtract 32-bit integers. 


Description: rd<rs-rt 


The 32-bit word value in GPR rt is subtracted from the 32-bit value in GPR rs and the 32-bit 
arithmetic result is placed into GPR rd. 


No integer overflow exception occurs under any circumstances. 


Restrictions: 
On 64-bit processors, if either GPR 7t or GPR rs do not contain sign-extended 32-bit values 
(bits 63..31 equal), then the result of the operation is undefined. 

Operation: 


if (NotWordValue(GPR[rs]) or NotWordValue(GPRirt])) then UndefinedResult() endif 
temp <-GPR[rs] - GPR[rt] 
GPRi[rd] <-temp 


Exceptions: 
None 


Programming Notes: 


The term “unsigned” in the instruction name is a misnomer; this operation is 32-bit modulo 
arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, 
such as address arithmetic, or integer arithmetic environments that ignore overflow, such as “C” 
language arithmetic. 
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SW 


Store Word 
31 26 25 21 20 16 15 0 
SW base rt offset 
101011 
6 5 5 16 
Format: SW tt, offset(base) MIPS I 
Purpose: To store a word to memory. 


Description: memory[base+offset] < rt 


The least-significant 32-bit word of register rt is stored in memory at the location specified by 
the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to 


form the effective address. 


Restrictions: 


The effective address must be naturally aligned. If either of the two least-significant bits of the 


address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 


instruction is undefined. 


Operation: 32-bit Processors 
vAddr < sign_extend(offset) + GPR[base] 


if (vAddr, 9) # 0? then SignalException(AddressError) endif 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, STORE) 
dataword + GPR{rt] 

StoreMemory (uncached, WORD, dataword, pAddr, vAddr, DATA) 


Operation: 64-bit Processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vAddry_9) # 0? then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr < pAddrpgjze-1..3 || (PAddrs 9 xor (ReverseEndian || 0) 
byte — vAddro 9 xor (BigEndianCPU || 0°) 

datadouble — GPRI[rt]g3.g*byte || 08 >" 


StoreMemory (uncached, WORD, datadouble, pAddr, vAddr, DATA) 


Exceptions: 


TLB Refill, TLB Invalid 
TLB Modified 
Address Error 
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SWCz Store Word From Coprocessor 
31 26 25 21 20 16 15 0 
SWCz base rt offset 
1110zz 
6 5 5 16 
Format: SWC1 rt, offset(base) MIPS | 


Pur 


SWC2_ rt, offset(base) 
SWC3 rt, offset(base) 


pose: To store a word from a coprocessor general register to memory. 


Description: memory[base+offset] < rt 


Coprocessor unit zz supplies a 32-bit word which is stored at the memory location specified by 
the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to 
form the effective address. 


The data supplied by each coprocessor is defined by the individual coprocessor specifications. 
The usual operation would read the data from coprocessor general register rt. 


Each MIPS architecture level defines up to 4 coprocessor units, numbered 0 to 3 (see 
1.2.5 Coprocessor Instructions). The opcodes corresponding to coprocessors that are not 
defined by an architecture level may be used for other instructions. 


Restrictions: 


Access to the coprocessors is controlled by system software. Each coprocessor has a 
“coprocessor usable” bit in the System Control coprocessor. The usable bit must be set for a 
user program to execute a coprocessor instruction. If the usable bit is not set, an attempt to 
execute the instruction will result in a Coprocessor Unusable exception. An unimplemented 
coprocessor must never be enabled. The result of executing this instruction for an 
unimplemented coprocessor when the usable bit is set, is undefined. 


This instruction is not available for coprocessor 0, the System Control coprocessor, and the 
opcode may be used for other instructions. 


The effective address must be naturally aligned. If either of the two least-significant bits of the 
address are non-zero, an Address Error exception occurs. 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 32-bit processors 


vAddr < sign_extend(offset) + GPR[base] 

if (vVAddr, 9) # 0° then SignalException(AddressError) endif 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, STORE) 
dataword < COP_SW (z, rt) 

StoreMemory (uncached, WORD, dataword, pAddr, vAddr, DATA) 
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Store Word From Coprocessor SWCz 


Operation: 64-bit processors 
vAddr < sign_extend(offset) + GPR[base] 
if (vAddr; 0) # 07 then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr < pAddrpgjze-1..3 || (PAddrs 9 xor (ReverseEndian || 07) 
byte + vAddro 9 xor (BigEndianCPU || 0) 
dataword~ COP_SW (z, rt) 
datadouble — 028 yte || gataword || 08 yt 
StoreMemory (uncached, WORD, datadouble, pAddr, vAddr DATA) 


Exceptions: 
TLB Refill, TLB Invalid 
TLB Modified 
Address Error 
Reserved Instruction 


Coprocessor Unusable 
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SW L Store Word Left 


31 26 25 21 20 16 15 0 
SWL base rt offset 
101010 
6 5 5 16 
Format: SWL_ rt, offset(base) MIPS I 
Purpose: To store the most-significant part of a word to an unaligned memory address. 


Description: memory[base+offset] < rt 


The 16-bit signed offset is added to the contents of GPR base to form an effective address 
(EffAddr). EffAddr is the address of the most-significant of four consecutive bytes forming a 
word in memory (W) starting at an arbitrary byte boundary. A part of W, the most-significant 
one to four bytes, is in the aligned word containing EffAddr. The same number of the most- 
significant (left) bytes from the word in GPR rt are stored into these bytes of W. 


If GPR rt is a 64-bit register, the source word is the low word of the register. 


The figure below illustrates this operation for big-endian byte ordering for 32-bit and 64-bit 
registers. The four consecutive bytes in 2..5 form an unaligned word starting at location 2. A 
part of W, two bytes, is contained in the aligned word containing the most-significant byte at 2. 
First, SWL stores the most-significant two bytes of the low-word from the source register into 
these two bytes in memory. Next, the complementary SWR stores the remainder of the 
unaligned word. 


Word at byte 2 in memory, big-endian byte order, - each mem byte contains its address 
most —significance— least 


Memory: Initial contents 


of/1/2/3[4]/5|6]7 [s 


ws, 


64-bit GPR 24 A;)/B|C;D|E/F{|G|H 
32-bit GPR 24 E;}F|G|H 


After executing SWL $24,2 ($0) 


0;/1/E/FIG|H|6 oS Then after swR $24,5($0) 
7 


Figure 1-8 Unaligned Word Store using SWL and SWR 
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Store Word Left SW L 


The bytes stored from the source register to memory depend on both the offset of the effective 
address within an aligned word, i.e. the low two bits of the address (vAddr, 9), and the current 
byte ordering mode of the processor (big- or little-endian). The table below shows the bytes 
stored for every combination of offset and byte ordering. 


Table 1-40 Bytes Stored by SWL Instruction 


Memory contents and byte offsets Initial contents of Dest Register 

0 1 2 3 €big-endian 64-bit register 

i} j | kf offset (vAddr; 0) A;}/B|C;D|E/}F|G|H 

3 2 1 #O €ilittle-endian most — significance — least 
most least 32-bit register | E| F | G/H 

— significance — 


Memory contents after instruction (shaded is unchanged) 


byte ordering YAS. ordoring, 
EF GH| 0 i j ke 
iJE F G|] 1 i j|/E F 
i j/E Fi 2 FG 
i j k/E| 3 |E F GH 


Operation: 32-bit Processors 
vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr <— pAddripgize-1)..2 || (PAddry 9 xor ReverseEndian?) 
If BigEndianMem = 0 then 
pAddr — pAddrpgize-t)..2 || 0° 
endif 
byte < vAddr; 9 xor BigEndianCPU? 
dataword <— Gace eye || GPR[rt]31.24-8*byte 
StoreMemory (uncached, byte, dataword, pAddr, vAddr, DATA) 
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SWL 


Store Word Left 


Operation: 64-bit Processors 
vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr <— pAddrpgize-1)..3 || (PAddrs 9 xor ReverseEndian®) 
If BigEndianMem = 0 then 
pAddr — pAddr pgize-1)..2 || 0° 
endif 
byte — vAddr;.9 xor BigEndianCPU? 
if (vAddr5 xor BigEndianCPU) = 0 then 
datadouble <— 0° || 0°48"PY'e || GPRirt}s1.24-8byte 
else 
datadouble — 0°*8°>Y'* || GPRirt}s1_24-g*byte || 02° 
endif 
StoreMemory(uncached, byte, datadouble, pAddr, vAddr, DATA) 


Exceptions: 
TLB Refill, TLB Invalid 
TLB Modified 


Bus Error 
Address Error 


Chapter 1 CPU Instruction Set 


Store Word Right SWR 


31 26 25 21 20 16 15 0 
SWR base rt offset 
101110 
6 5 5 16 
Format: SWR._ ft, offset(base) MIPS I 
Purpose: To store the least-significant part of a word to an unaligned memory address. 


Description: memory[base+offset] < rt 


The 16-bit signed offset is added to the contents of GPR base to form an effective address 
(EffAddr). EffAddr is the address of the least-significant of four consecutive bytes forming a 
word in memory (W) starting at an arbitrary byte boundary. A part of W, the least-significant 
one to four bytes, is in the aligned word containing EffAddr. The same number of the least- 
significant (right) bytes from the word in GPR rt are stored into these bytes of W. 


If GPR rt is a 64-bit register, the source word is the low word of the register. 


The figure below illustrates this operation for big-endian byte ordering for 32-bit and 64-bit 
registers. The four consecutive bytes in 2..5 form an unaligned word starting at location 2. A 
part of W, two bytes, is contained in the aligned word containing the least-significant byte at 5. 
First, SWR stores the least-significant two bytes of the low-word from the source register into 
these two bytes in memory. Next, the complementary SWL stores the remainder of the 
unaligned word. 


Word at byte 2 in memory, big-endian byte order, - each mem byte contains its address 
most — significance — least 


Memory: Initial contents 


oj/1j2]/3]4]sjej7][e 


ws, 


64-bit GPR 24 A;/B|C;D|E/F{|G|H 
32-bit GPR 24 


o/1/2/3]7G/H]|6 i$ After executing SWR $24,5($0) 


o/1/E Fla H| 6 7s Then after SWL $24,2($0) 


Figure 1-9 Unaligned Word Store using SWR and SWL 
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SWR Store Word Right 


The bytes stored from the source register to memory depend on both the offset of the effective 
address within an aligned word, i.e. the low two bits of the address (vAddr, 9), and the current 
byte ordering mode of the processor (big- or little-endian). The tabel below shows the bytes 
stored for every combination of offset and byte ordering. 


Table 1-41 Bytes Stored by SWR Instruction 


Memory contents and byte offsets Initial contents of Dest Register 

0 1 2 3 €big-endian 64-bit register 

i} j | kf offset (vAddr; 0) A;}/B|C;D|E/}F|G|H 

3 2 1 #O €ilittle-endian most — significance — least 
most least 32-bit register | E| F | G/H 

— significance — 


Memory contents after instruction (shaded is unchanged) 


byte ordering YAS. ordering 
H]j k | 0 |E F GH 
G H|k | 1 F G HII 
FG H| | 2 |G H{[k | 
EF GH| 3 |H{/j k 1 


Restrictions: 


None 


Operation: 32-bit Processors 


vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr <— pAddrpgize-4)..2 || (PAddry 9 xor ReverseEndian?) 
BigEndianMem = 0 then 
pAddr <— pAddr pgize-1)..2 || 0° 
endif 
byte — vAddr;.9 xor BigEndianCPU2 
dataword <— GPRIrt]31_g+byte || 08 M° 
StoreMemory (uncached, WORD-byte, dataword, pAddr, vAddr, DATA) 
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Store Word Right SWR 


Operation: 64-bit Processors 
vAddr < sign_extend(offset) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr <— pAddrpgize-1)..3 || (PAddrs 9 xor ReverseEndian®) 
If BigEndianMem = 0 then 
pAddr <— pAddrpgize-1)..2 || 0° 
endif 
byte — vAddr;.9 xor BigEndianCPU2 
if (vAddr5 xor BigEndianCPU) = 0 then 
datadouble <— 0°* || GPR[rt]31.g*byte..o || 08 >Y"* 
else 
datadouble — GPRIrt]31.8*byte..o || 0° >Y"* || 0% 
endif 
StoreMemory(uncached, WORD-byte, datadouble, pAddr, vAddr, DATA) 


Exceptions: 


TLB Refill, TLB Invalid 
TLB Modified 

Bus Error 

Address Error 
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SYNC Synchronize Shared Memory 
31 26 25 11 10 6 5 0 
SPECIAL 0 stype SYNC 
000000 00 0000 0000 0000 0 001111 
6 15 5 6 
Format: SYNC (stype = 0 implied) MIPS II 
Purpose: To order loads and stores to shared memory in a multiprocessor system. 
Description: 


To serve a broad audience, two descriptions are given. A simple description of SYNC that 
appeals to intuition is followed by a precise and detailed description. 


A Simple Description: 


SYNC affects only uncached and cached coherent loads and stores. The loads and stores that 
occur prior to the SYNC must be completed before the loads and stores after the SYNC are 
allowed to start. 


Loads are completed when the destination register is written. Stores are completed when the 
stored value is visible to every other processor in the system. 


A Precise Description: 


If the stype field has a value of zero, every synchronizable load and store that occurs in the 
instruction stream prior to the SYNC instruction must be globally performed before any 
synchronizable load or store that occurs after the SYNC may be performed with respect to any 
other processor or coherent I/O module. 


Sync does not guarantee the order in which instruction fetches are performed. 
The stype values 1-31 are reserved; they produce the same result as the value zero. 


Synchronizable: A load or store instruction is synchronizable if the load or store occurs to a 
physical location in shared memory using a virtual location with a memory access type of either 
uncached or cached coherent. Shared memory is memory that can be accessed by more than 
one processor or by a coherent I/O system module. 


1.6 Memory Access Types contains information on memory access types. 


Performed load: A \oad instruction is performed when the value returned by the load has been 
determined. The result of a load on processor A has been determined with respect to processor 
or coherent I/O module B when a subsequent store to the location by B cannot affect the value 
returned by the load. The store by B must use the same memory access type as the load. 


Performed store: A store instruction is performed when the store is observable. A store on 
processor A is observable with respect to processor or coherent I/O module B when a 
subsequent load of the location by B returns the value written by the store. The load by B must 
use the same memory access type as the store. 
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Synchronize Shared Memory SYNC 


Globally performed load: A \oad instruction is globally performed when it is performed with 
respect to all processors and coherent I/O modules capable of storing to the location. 


Globally performed store: A store instruction is globally performed when it is globally 
observable. It is globally observable when it observable by all processors and I/O modules 
capable of loading from the location. 


Coherent I/O module: A coherent I/O module is an Input/Output system component that 
performs coherent Direct Memory Access (DMA). It reads and writes memory independently 
as though it were a processor doing loads and stores to locations with a memory access type of 
cached coherent. 


Restrictions: 


The effect of SYNC on the global order of the effects of loads and stores for memory access 
types other than uncached and cached coherent is not defined. 


Operation: 
SyncOperation(stype) 
Exceptions: 


Reserved Instruction 


Programming Notes: 


A processor executing load and store instructions observes the effects of the loads and stores 
that use the same memory access type in the order that they occur in the instruction stream; this 
is known as program order. A parallel program has multiple instruction streams that can 
execute at the same time on different processors. In multiprocessor (MP) systems, the order in 
which the effects of loads and stores are observed by other processors, the global order of the 
loads and stores, determines the actions necessary to reliably share data in parallel programs. 


When all processors observe the effects of loads and stores in program order, the system is 
strongly ordered. On such systems, parallel programs can reliably share data without explicit 
actions in the programs. For such a system, SYNC has the same effect as a NOP. Executing 
SYNC on such a system is not necessary, but is also not an error. 


If a multiprocessor system is not strongly ordered, the effects of load and store instructions 
executed by one processor may be observed out of program order by other processors. On such 
systems, parallel programs must take explicit actions in order to reliably share data. At critical 
points in the program, the effects of loads and stores from an instruction stream must occur in 
the same order for all processors. SYNC separates the loads and stores executed on the 
processor into two groups and the effects of these groups are seen in program order by all 
processors. The effect of all loads and stores in one group is seen by all processors before the 
effect of any load or store in the other group. In effect, SYNC causes the system to be strongly 
ordered for the executing processor at the instant that the SYNC is executed. 
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SYNC Synchronize Shared Memory 


Many MIPS-based multiprocessor systems are strongly ordered or have a mode in which they 
operate as strongly ordered for at least one memory access type. The MIPS architecture also 
permits MP systems that are not strongly ordered. SYNC enables the reliable use of shared 
memory on such systems. A parallel program that does not use SYNC will generally not operate 
on a system that is not strongly ordered, however a program that does use SYNC will work on 
both types of systems. System-specific documentation will describe the actions necessary to 
reliably share data in parallel programs for that system. 


The behavior of a load or store using one memory access type is undefined if a load or store was 
previously made to the same physical location using a different memory access type. The 
presence of a SYNC between the references does not alter this behavior. See 1.6.1 Mixing 
References with Different Access Types for a more complete discussion. 


SYNC affects the order in which the effects of load and store instructions appears to all 
processors; it not generally affect the physical memory-system ordering or synchronization 
issues that arise in system programming. The effect of SYNC on implementation specific 
aspects of the cached memory system, such as writeback buffers, is not defined. The effect of 
SYNC on reads or writes to memory caused by privileged implementation-specific instructions, 
such as CACHE, is not defined. 


Prefetch operations have no effects detectable by user-mode programs so ordering the effects of 
prefetch operations is not meaningful. 
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Synchronize Shared Memory SYNC 


EXAMPLE: These code fragments show how SYNC can be used to coordinate the use of 
shared data between separate writer and reader instruction streams in a multiprocessor 
environment. The FLAG location is used by the instruction streams to determine whether the 
shared data item DATA is valid. The SYNC executed by processor A forces the store of DATA 
to be performed globally before the store to FLAG is performed. The SYNC executed by 


processor B ensures that DATA is not read until after the FLAG value indicates that the shared 
data is valid. 


Processor A (writer) 


# Conditions at entry: 
# The value 0 has been stored in FLAG and that value is observable by B. 


SW R1, DATA # change shared DATA value 
LI R2, 1 
SYNC # perform DATA store before performing FLAG store 
SW R2, FLAG # say that the shared DATA value is valid 
Processor B (reader) 
LI R2, 1 
1: LW R1, FLAG # get FLAG 
BNE R2,R1, 1B # if it says that DATA is not valid, poll again 
NOP 
SYNC # FLAG value checked before doing DATA reads 
LW R1, DATA # read (valid) shared DATA values 


Implementation Notes: 


There may be side effects of uncached loads and stores that affect cached coherent load and 
store operations. To permit the reliable use of such side effects, buffered uncached stores that 


occur before the SYNC must be written to memory before cached coherent loads and stores after 
the SYNC may be performed. 
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SYSCALL 
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System Call 


31 26 25 5 0 
SPECIAL Code SYSCALL 
000000 001100 
6 20 6 
Format: SYSCALL MIPS | 
Purpose: To cause a System Call exception. 
Description: 


A system call exception occurs, immediately and unconditionally transferring control to the 


exception handler. 


The code field is available for use as software parameters, but is retrieved by the exception 
handler only by loading the contents of the memory word containing the instruction. 


Restrictions: 


None 


Operation: 


SignalException(SystemCall) 


Exceptions: 
System Call 
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Trap if Equal 


TEQ 


31 26 25 21 20 16 15 5 0 
SPECIAL rs rt code TEQ 
000000 110100 

6 5 5 10 6 
Format: TEQ 1s, rt MIPS Il 


Purpose: To compare GPRs and do a conditional Trap. 


Description: if (rs = rt) then Trap 


Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is equal to GPR rt 


then take a Trap exception. 


The contents of the code field are ignored by hardware and may be used to encode information 
for system software. To retrieve the information, system software must load the instruction 


word from memory. 


Restrictions: 
None 
Operation: 
if GPR[rs] = GPR[rt] then 


SignalException(Trap) 
endif 


Exceptions: 
Reserved Instruction 


Trap 
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TEQI Trap if Equal Immediate 
31 26 25 21 20 16 15 0 
REGIMM rs TEQI immediate 
000001 01100 
6 5 5 16 
Format: TEQI rs, immediate MIPS Il 
Purpose: To compare a GPR to a constant and do a conditional Trap. 


Description: if (rs = immediate) then Trap 
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs 
is equal to immediate then take a Trap exception. 

Restrictions: 


None 


Operation: 
if GPR[rs] = sign_extend(immediate) then 
SignalException(Trap) 
endif 
Exceptions: 


Reserved Instruction 
Trap 
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Trap if Greater or Equal 


31 26 25 21 20 16 15 0 
SPECIAL rs rt code TGE 
000000 110000 
6 5 5 10 6 
Format: TGE rs, rt MIPS Il 


Purpose: To compare GPRs and do a conditional Trap. 


Description: if (rs > rt) then Trap 


Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is greater than or 


equal to GPR rt then take a Trap exception. 


The contents of the code field are ignored by hardware and may be used to encode information 
for system software. To retrieve the information, system software must load the instruction 


word from memory. 


Restrictions: 


None 


Operation: 
if GPR[rs] > GPR[rt] then 
SignalException(Trap) 
endif 
Exceptions: 


Reserved Instruction 
Trap 
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TG El Trap if Greater or Equal Immediate 
31 26 25 21 20 16 15 0 
REGIMM rs TGEl immediate 
000001 01000 
6 5 5 16 
Format: TGEI rs, immediate MIPS Il 
Purpose: To compare a GPR to a constant and do a conditional Trap. 


Description: if (rs > immediate) then Trap 
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs 
is greater than or equal to immediate then take a Trap exception. 

Restrictions: 


None 


Operation: 
if GPR[rs] > sign_extend(immediate) then 
SignalException(Trap) 
endif 
Exceptions: 


Reserved Instruction 
Trap 
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Trap If Greater Or Equal Immediate Unsigned TG EIU 
31 26 25 21 20 16 15 0 
REGIMM rs TGEIU immediate 
000001 01001 
6 5 5 16 
Format: TGEIU rs, immediate MIPS Il 
Purpose: To compare a GPR to a constant and do a conditional Trap. 


Description: if (rs > immediate) then Trap 
Compare the contents of GPR rs and the 16-bit sign-extended immediate as unsigned integers; 


if GPR rs is greater than or equal to immediate then take a Trap exception. 


Because the 16-bit immediate is sign-extended before comparison, the instruction is able to 
represent the smallest or largest unsigned numbers. The representable values are at the 
minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned 
range. 


Restrictions: 


None 


Operation: 
if (0 || GPRirs]) = (0 || sign_extend(immediate)) then 
SignalException(Trap) 
endif 


Exceptions: 


Reserved Instruction 
Trap 
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TG EU Trap If Greater or Equal Unsigned 
31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt code TGEU 
000000 110001 
6 5 5 10 6 

Format: TGEU ts, rt MIPS Il 


Purpose: To compare GPRs and do a conditional Trap. 


Description: if (rs > rt) then Trap 


Compare the contents of GPR rs and GPR rt as unsigned integers; if GPR rs is greater than or 
equal to GPR rt then take a Trap exception. 


The contents of the code field are ignored by hardware and may be used to encode information 
for system software. To retrieve the information, system software must load the instruction 
word from memory. 


Restrictions: 


None 


Operation: 
if (O || GPRirs]) = (0 || GPR[rt]) then 
SignalException(Trap) 
endif 
Exceptions: 


Reserved Instruction 
Trap 
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Probe TLB For Matching Entry TLBP 
31 26 25 24 65 0 
COPO CO 0 TLBP 
010000 1 0000000 0000 0000 0000 001000 
6 1 19 6 
Format: TLBP MIPS | 
Description: 


The Index register is loaded with the address of the TLB entry whose contents match the 
contents of the EntryHi register. If no TLB entry matches, the high-order bit of the Index 
register is set to Ox80000000, as it is in the R4400 processor. 


The architecture does not specify the operation of memory references associated with the 
instruction immediately after a TLBP instruction, nor is the operation specified if more than one 
TLB entry matches. 


Operation: 32-bit processors 


Index 1 || 02° || undefined® 
for i in 0...TLBEntries—1 
if (TLB[i]g5..77 = EntryHig;__42) and (TLB[i]76 or 
(TLBUil71...64 = EntryHi7,_)) then 
Index — 0°8 ||is5, 9 
endif 
endfor 


Operation: 64-bit processors 


Index 1 || 0 7° || undefined® 
for iin 0...TLBEntries—1 
if (TLBUi}171..141 and not (0° || TLBlijo16...208)) 
= EntryHigg,_ 43) and not (0'° || TLB[i}o16..205)) and 
(TLB[i]140 or (TLBTil135...128 = EntryHiz7,_9)) then 
Index — 0° ||i5 9 
endif 
endfor 


Exceptions: 


Coprocessor Unusable 
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TLB R Read Indexed TLB Entry 
31 26 25 24 65 0 
COPO CO 0 TLBR 
010000 1 00000000000 0000 0000 000001 
6 1 19 6 
Format: TLBR MIPS I 
Description: 


The G bit (which controls ASID matching) read from the TLB is written into both of the 
EntryLo0 and EntryLo! registers. 


The EntryHi and EntryLo registers are loaded with the contents of the TLB entry pointed at by 
the contents of the TLB /ndex register. 


In the R4400, this instruction had to be executed in unmapped spaces, and in the R5000 and the 
R10000 processor it can be executed in unmapped spaces without any hazard. In addition, 
TLBR can be executed in mapped spaces. 


Operation: 32-bit processors 


PageMask <— TLB[Indexs oli27..9 
EntryHi — TLB[Indexs olgs._ 64 and not TLB[Indexs ol127..96 
EntryLo1 <TLB[Indexs gles. 32 
EntryLoO < TLB[Indexs,_ 131.0 


Operation: 64-bit processors 


PageMask eS TLB[Indexs, gloss...192 

EntryHi — TLB[Indexs, 9]191..428 and not TLB[Indexs ola5s5...192 
EntryLo1 <TLB[Indexs o]127...65 || TLB[Indexs. ol140 

EntryLoO < TLB[Indexs, olg3..4 || TLB[Indexs ol14o 


Exceptions: 


Coprocessor Unusable 
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te Indexed TLB Entry TL BWI 


26 25 24 65 0 


CO 0 
010000 1 0000000 0000 0000 0000 000010 


For 


6 1 19 6 


mat: TLBWI MIPS | 


Description: 


The G bit of the TLB is written with the logical AND of the G bits in the EntryLo0 and EntryLo1 
registers. 


The TLB entry pointed at by the contents of the TLB Index register is loaded with the contents 
of the EntryHi and EntryLo registers. 


The operation is invalid (and the results are unspecified) if the contents of the TLB Index 
register are greater than the number of TLB entries in the processor. 


In the R4400, this instruction had to be executed in unmapped spaces, and in the R5000 and the 
R10000 processor it can be executed in unmapped spaces without any hazard. 


There is no hazard to executing a TLB write in mapped space unless the write affects those 
instructions that have been fetched and buffered by the processor. If necessary, a flush to the 
instruction-fetch pipeline, such as execution of a jump register instruction, after a TLB write can 
avoid this hazard. 


In the R4400 processor, a TLB write instruction is used to write the whole page frame number 
from the EntryLo registers to the TLB entry. Depending on the page size specified in the 
corresponding PageMask register, the lower bits of PFN may not be used for address 
translation. In the R5000 and the R10000 processor, the lower bits not used for address 
translation are forced to zeroes during a TLB write. This does not affect TLB address 
translation, however a TLB read may not retrieve what was originally written. 


Operation: 


TLB[Indexs o] <— 
PageMask || (EntryHi and not PageMask) || EntryLo1 || EntryLoO 


Exceptions: 


Coprocessor Unusable 
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TL BWR Write Random TLB Entry 


31 26 25 24 6 5 0 
COPO CO 0 TLBWR 
010000 1 000 0000 0000 0000 0000 000110 
6 1 19 6 
Format: TLBWR MIPS | 
Description: 
The G bit of the TLB is written with the logical AND of the G bits in the EntryLo0 and EntryLo1 
registers. 


The TLB entry pointed at by the contents of the TLB Random register is loaded with the 
contents of the EntryHi and EntryLo registers. 


In the R4400, this instruction had to be executed in unmapped spaces, and in the R5000 and the 
R10000 processor it can be executed in unmapped spaces without any hazard. 


There is no hazard to executing a TLB write in mapped space unless the write affects those 
instructions that have been fetched and buffered by the processor. If necessary, a flush to the 
instruction-fetch pipeline, such as execution of a jump register instruction, after a TLB write can 
avoid this hazard. 


In the R4400 processor, a TLB write instruction is used to write the whole page frame number 
from the EntryLo registers to the TLB entry. Depending on the page size specified in the 
corresponding PageMask register, the lower bits of PFN may not be used for address 
translation. In the R5000 and the R10000 processor, the lower bits not used for address 
translation are forced to zeroes during a TLB write. This does not affect TLB address 
translation, however a TLB read may not retrieve what was originally written. 


Operation: 


TLB [Randoms »] <— 
PageMask || (EntryHi and not PageMask) || EntryLo1 || EntryLoO 


Exceptions: 


Coprocessor Unusable 
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Trap if Less Than 


TLT 


31 26 25 21 20 16 15 0 
SPECIAL rs rt code TLT 
000000 110010 
6 5 5 10 6 
Format: TLT rs, rt MIPS II 


Purpose: To compare GPRs and do a conditional Trap. 


Description: if (rs < rt) then Trap 


Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is less than GPR rt 


then take a Trap exception. 


The contents of the code field are ignored by hardware and may be used to encode information 
for system software. To retrieve the information, system software must load the instruction 


word from memory. 


Restrictions: 


None 


Operation: 
if GPR[rs] < GPR[rt] then 
SignalException(Trap) 
endif 
Exceptions: 


Reserved Instruction 
Trap 
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TLTl Trap if Less Than Immediate 
31 26 25 21 20 16 15 0 
REGIMM rs TLTI immediate 
000001 01010 
6 5 5 16 
Format: TLTI rs, immediate MIPS Il 
Purpose: To compare a GPR to a constant and do a conditional Trap. 


Description: if (rs < immediate) then Trap 
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs 
is less than immediate then take a Trap exception. 

Restrictions: 


None 


Operation: 


if GPR[rs] < sign_extend(immediate) then 
SignalException(Trap) 
endif 


Exceptions: 


Reserved Instruction 
Trap 
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Trap if Less Than Immediate Unsigned TLT U 
31 26 25 21 20 16 15 0 
REGIMM rs TLTIU immediate 
000001 01011 
6 5 5 16 
Format: TLTIU rs, immediate MIPS II 
Purpose: To compare a GPR to a constant and do a conditional Trap. 


Description: if (rs < immediate) then Trap 
Compare the contents of GPR rs and the 16-bit sign-extended immediate as unsigned integers; 


if GPR rs is less than immediate then take a Trap exception. 


Because the 16-bit immediate is sign-extended before comparison, the instruction is able to 
represent the smallest or largest unsigned numbers. The representable values are at the 
minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned 
range. 


Restrictions: 


None 


Operation: 
if (O || GPRirs]) < (0 || sign_extend(immediate)) then 
SignalException(Trap) 
endif 


Exceptions: 


Reserved Instruction 
Trap 
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TLTU Trap if Less Than Unsigned 
31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt code TLTU 
000000 110011 
6 5 5 10 6 

Format: TLTU rs, rt MIPS Il 


Purpose: To compare GPRs and do a conditional Trap. 


Description: if (rs < rt) then Trap 


Compare the contents of GPR rs and GPR rt as unsigned integers; if GPR rs is less than GPR rt 
then take a Trap exception. 


The contents of the code field are ignored by hardware and may be used to encode information 
for system software. To retrieve the information, system software must load the instruction 
word from memory. 


Restrictions: 


None 


Operation: 
if (0 || GPR[rs]) < (0 || GPR[rt]) then 
SignalException(Trap) 
endif 


Exceptions: 


Reserved Instruction 
Trap 
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Trap if Not Equal 


TNE 


31 26 25 21 20 16 15 0 
SPECIAL rs rt code TNE 
000000 110110 
6 5 5 10 6 
Format: TNE rs, rt MIPS II 


Purpose: To compare GPRs and do a conditional Trap. 


Description: if (rs # rt) then Trap 


Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is not equal to GPR rt 


then take a Trap exception. 


The contents of the code field are ignored by hardware and may be used to encode information 
for system software. To retrieve the information, system software must load the instruction 


word from memory. 


Restrictions: 


None 


Operation: 
if GPR[rs]  GPR[rt] then 
SignalException(Trap) 
endif 
Exceptions: 


Reserved Instruction 
Trap 
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TN El Trap if Not Equal Immediate 
31 26 25 21 20 16 15 0 
REGIMM rs TNE! immediate 
000001 01110 
6 5 5 16 
Format: TNEI rs, immediate MIPS II 
Purpose: To compare a GPR to a constant and do a conditional Trap. 


Description: if (rs + immediate) then Trap 
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs 
is not equal to immediate then take a Trap exception. 

Restrictions: 


None 


Operation: 


if GPR[rs] # sign_extend(immediate) then 
SignalException(Trap) 
endif 


Exceptions: 


Reserved Instruction 
Trap 


Chapter 1 CPU Instruction Set 


Enter Standby Mode (R5000 only) WAIT 

31 26 25 24 6 5 0 
COPO CO 0 WAIT 
010000 1 000 0000 0000 0000 0000 100000 
6 1 19 6 

Format: WAIT 

Purpose: To put the CPU into Standby Mode. 

Description: 


In Standby Mode, most of the internal clocks are shut down which freezes the pipeline and 
reduces power consumption. See Vp5000 User’s Manual for more details. 


Restrictions: 
None 


Operation: 


if SysAD bus is idle then 
Enter Standby Mode 


endif 
Exceptions: 


Coprocessor Unusable 
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XO R Exclusive OR 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 XOR 
000000 00000 100110 
6 5 5 5 5 6 
Format: XOR_ rd, rs, rt MIPS | 
Purpose: To do a bitwise logical EXCLUSIVE OR. 


Description: rd<rs XOR rt 


Combine the contents of GPR rs and GPR rt in a bitwise logical exclusive OR operation and 
place the result into GPR rd. 


Restrictions: 


None 


Operation: 
GPR[rd] <— GPR[rs] xor GPR[rt] 


Exceptions: 


None 
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Exclusive OR Immediate XO RI 
31 26 25 21 20 16 15 0 
XORI rs rt immediate 
001110 
6 5 5 16 
Format: XORI_ rt, rs, immediate MIPS I 
Purpose: To do a bitwise logical EXCLUSIVE OR with a constant. 


Description: rt <— rs XOR immediate 


Combine the contents of GPR rs and the 16-bit zero-extended immediate in a bitwise logical 
exclusive OR operation and place the result into GPR rt. 


Restrictions: 

None 
Operation: 

GPRi[rt] < GPR[rs] xor zero_extend(immediate) 
Exceptions: 


None 
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1.10 CPU Instruction Formats 


A CPU instruction is a single 32-bit aligned word. The major instruction formats are shown 


in Figure 1-10. 


I-Type (Immediate). 


31 26 25 21 20 16 15 0 
opcode rs rt offset 
6 5 5 16 
J-Type (Jump). 
31 26 25 0 
opcode instr_index 
6 26 
R-Type (Register). 
31 26 25 21 20 16 15 11 10 6 5 0 
opcode rs rt rd sa function 
6 5 5 5 5 6 
opcode 6-bit primary operation code 
rd 5-bit destination register specifier 
rs 5-bit source register specifier 
ms 5-bit target (source/destination) register specifier or used to 
specify functions within the primary opcode value REGIMM 
immediate 16-bit signed immediate used for: logical operands, arithmetic 


instr_index 


sa 


function 
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signed operands, load/store address byte offsets, PC-relative 
branch signed instruction displacement 


26-bit index shifted left two bits to supply the low-order 28 bits of 
the jump target address. 


5-bit shift amount 


6-bit function field used to specify functions within the primary 
operation code value SPECIAL. 


Figure 1-10 CPU Instruction Formats 
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1.11 CPU Instruction Encoding 


1.11.1 Instruction Decode 


This section describes the encoding of user-level, i.e. non-privileged, CPU instructions for 
the four levels of the MIPS architecture, MIPS I through MIPS IV. Each architecture level 
includes the instructions in the previous level;* MIPS IV includes all instructions in MIPS 
I, MIPS II, and MIPS III. This section presents eight different views of the instruction 
encoding. 


¢ Separate encoding tables for each architecture level. 


« A MIPS IV encoding table showing the architecture level at which each opcode 
was originally defined and subsequently modified (if modified). 


¢ Separate encoding tables for each architecture revision showing the changes made 
during that revision. 


Instruction field names are printed in bold in this section. 


The primary opcode field is decoded first. Most opcode values completely specify an 
instruction that has an immediate value or offset. Opcode values that do not specify an 
instruction specify an instruction class. Instructions within a class are further specified by 
values in other fields. The opcode values SPECIAL and REGIMM specify instruction 
classes. The COPO, COP], COP2, COP3, and COPIX instruction classes are not CPU 
instructions; they are discussed in 1.11.3 Non-CPU Instructions in the Tables. 


(1) SPECIAL Instruction Class 


The opcode=SPECIAL instruction class encodes 3-register computational instructions, 
jump register, and some special purpose instructions. The class is further decoded by 
examining the format field. The format values fully specify the CPU instructions; the 
MOVCT instruction class is not a CPU instruction class. 


(2) REGIMM Instruction Class 


The opcode=REGIMM instruction class encodes conditional branch and trap immediate 
instructions. The class is further decode, and the instructions fully specified, by examining 
the rt field. 


1.11.2 Instruction Subsets of MIPS III and MIPS IV Processors 


MIPS III processors, such as the R4200, R4300, and R4400, have a processor mode in 
which only the MIPS II instructions are valid. The MIPS II encoding table describes the 
MIPS II-only mode except that the Coprocessor 3 instructions (COP3, LWC3, SWC3, 
LDC3, SDC3) are not available and cause a Reserved Instruction exception. 


+ An exception to this rule is that the reserved, but never implemented, Coprocessor 3 instructions were removed or 
changed to another use starting in MIPS III. 
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MIPS IV processors, such as the R5000 and the R10000, have processor modes in which 
only the MIPS II or MIPS II instructions are valid. The MIPS II encoding table describes 
the MIPS II-only mode except that the Coprocessor 3 instructions (COP3, LWC3, SWC3, 
LDC3, SDC3) are not available and cause a Reserved Instruction exception. The MIPS III 
encoding table describes the MIPS II-only mode. 


1.11.3 Non-CPU Instructions in the Tables 
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The encoding tables show all values for the field they describe and by doing this they 
include some entries that are not user-level CPU instructions. The primary opcode table 
includes coprocessor instruction classes (COPO, COP 1, COP2, COP3/COP1X) and 
coprocessor load/store instructions (LWCx, SWCx, LDCx, SDCx for x=1, 2, or 3). The 
opcode=SPECIAL + function=MOVCI instruction class is an FPU instruction. 


(1) Coprocessor 0 - COPO 


COPO encodes privileged instructions for Coprocessor 0, the System Control Coprocessor. 
The definition of the System Control Coprocessor is processor-specific and further 
information on these instructions are not included in this document. 


(2) Coprocessor 1 - COP1, COP1X, MOVCTI, and CP1 load/store 


Coprocessor | is the floating-point unit in the MIPS architecture. COP/, COP 1X, and the 
(opcode=SPECIAL + function=VOVCY) instruction classes encode floating-point 
instructions. LWC1, SWC1, LDC1, and SDC1 are floating-point loads and stores. The 
FPU instruction encoding is documented in 2.12 FPU (CP1) Instruction Opcode Bit 
Encoding. 


(3) Coprocessor 2 - COP2 and CP2 load/store 


Coprocessor 2 is optional and implementation-specific. None of the Vp-Series™ 
processors have implemented coprocessor 2. At this time the Vp-Series processors are: 
R4200, R4300, R4400, R5000, and R10000. 


(4) Coprocessor 3 - COP3 and CP3 load/store 


Coprocessor 3 is optional and implementation-specific in the MIPS I and MIPS II 
architecture levels. It was removed from MIPS III and later architecture levels. Note that 
in MIPS IV the COP3 primary opcode was reused for the COP/X instruction class. None 
of the Vp-Series processors have implemented coprocessor 2. At this time the Vp-Series 
processors are: R4200, R4300, R4400, R5000, and R10000. 
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Table 1-42. CPU Instruction Encoding - MIPS I Architecture 


31 26 0 
bits 28..26 Instructions encoded by opcode field. 
bits 0 1 2 3 4 5 6 7 

31..29 000 001 010 011 100 101 110 111 

0 000 | SPECIAL §|REGIMM & J JAL BEQ BNE BLEZ BGTZ 

1 001} ADDI ADDIU SLTI SLTIU ANDI ORI XORI LUI 

2 010] COPO 8x | COPI 8x | COP2 8x |COP3 32, i ** i ‘a 

4 100 LB LH LWL LW LBU LHU LWR ‘ 

5 101 SB SH SWL SW * * SWR * 

6 110 * LWClx | LWC2x | LWC3 xx * - - * 

7 (iil * SWClx | SWC22 | SWC3 xx« * ‘i + i 
31 26 5 0 
= SPECIAL 


function] bits 2..0 


Instructions encoded by function field when opcode field = SPECIAL. 


bits 0 1 2 3 4 5 6 a 
5.3 000 001 010 O11 100 101 110 111 
0 000} SLL ** SRL SRA SLLV ** SRLV SRAV 
1 001 JR JALR ** ** SYSCALL| BREAK ** *# 
2 010} MFHI MTHI MFLO MTLO ** ** ** ** 
3 O11) MULT | MULTU DIV DIVU ** ** ** ** 
4 100} ADD ADDU SUB SUBU AND OR XOR NOR 
5 101 ** ** SLT SLTU ** ** * ** 
31 26 20 16 0 
opcode [ «| 
= REGIMM 
bits 18..16 Instructions encoded by the rt field when opcode field = REGIMM. 
bits 0 1 2 3 4 5 6 7 
20..19 000 001 010 O11 100 101 110 111 
0 00] BLITZ BGEZ t t t t t t 
1 Ol t t t t + + + + 
2 10] BLTZAL | BGEZAL t t t t + t 
3 11 t t t t t + t t 
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Table 1-43 CPU Instruction Encoding - MIPS IT Architecture 


31 26 0 
opcode | bits 28..26 Instructions encoded by opcode field. 
bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 011 100 101 110 111 
0 000 | SPECIAL 8|REGIMM & J JAL BEQ BNE BLEZ BGTZ 
1 001} ADDI ADDIU SLTI SLTIU ANDI ORI XORI LUI 
2 010] COPO 8x | COPI 3x | COP2 8x |COP3 57,«| BEQL BNEL BLEZL BGTZL 
4 100 LB LH LWL LW LBU LHU LWR z 
5 101 SB SH SWL SW * * SWR r 
6 110 LL LWCl a | LWC22 | LWC3 nx * LDCI1x | LDC22 | LDC3 xx 
7 (ii SC SWCl xa | SWC2x | SWC3 2x * SDC1 x SDC22 | SDC3 2 
31 26 5 0 
= SPECIAL 
function| bits 2..0 Instructions encoded by function field when opcode field = SPECIAL. 
bits 0 1 2 3 4 5 6 i 
5.3 000 001 010 011 100 101 110 111 
0 000 SLL ‘i SRL SRA SLLV - SRLV SRAV 
1 001 JR JALR * - SYSCALL] BREAK - SYNC 
2 010} MFHI MTHI MFLO MTLO * ' * = 
3 O11] MULT MULTU DIV DIVU is 7 ‘i - 
4 100 ADD ADDU SUB SUBU AND OR XOR NOR 
5 101 - ‘i SLT SLTU * * * . 
6 110 TGE TGEU TLT TLTU TEQ * TNE * 
31 26 20 16 0 
opcode [ «| 
= REGIMM 
rt bits 18..16 Instructions encoded by the rt field when opcode field = REGIMM. 
bits 0 1 2 3 4 5 6 7 
20..19 000 001 010 011 100 101 110 111 
0 00 BLTZ BGEZ BLTZL BGEZL i - ‘ - 
1 Ol TGEI TGEIU TLTI TLTIU TEQI a TNEI ‘i 
2 10 | BLTZAL | BGEZAL | BLTZALL | BGEZALL 7 s = - 
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Table 1-44 CPU Instruction Encoding - MIPS III Architecture 


31 26 0 
opcode | bits 28..26 Instructions encoded by opcode field. 
bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 011 100 101 110 111 
0 000 | SPECIAL §|REGIMM & J JAL BEQ BNE BLEZ BGTZ 
1 001} ADDI ADDIU SLTI SLTIU ANDI ORI XORI LUI 
2 010] COPO 8x | COPI 35x | COP2 8x ‘i BEQL BNEL BLEZL BGTZL 
3 011) DADDI | DADDIU LDL LDR * ‘ * * 
4 100 LB LH LWL LW LBU LHU LWR LWU 
5 101 SB SH SWL SW SDL SDR SWR p 
6 110 LL LWCl 2 | LWC2x “ LLD LDCIlx | LDC2x LD 
7 (ii SC SWClx | SWC2x SCD SDC1 x SDC2 x SD 
31 26 5 0 
= SPECIAL 
function| bits 2..0 Instructions encoded by function field when opcode field = SPECIAL. 
bits 0 1 2 3 4 5 6 i 
5.3 000 001 010 011 100 101 110 111 
0 000 SLL ‘i SRL SRA SLLV - SRLV SRAV 
1 001 JR JALR * - SYSCALL] BREAK i SYNC 
2 010} MFHI MTHI MFLO MTLO DSLLV . DSRLV | DSRAV 
3 O11] MULT MULTU DIV DIVU DMULT |DMULTU]| DDIV DDIVU 
4 100 ADD ADDU SUB SUBU AND OR XOR NOR 
5 101 = ‘i SLT SLTU DADD | DADDU DSUB DSUBU 
6 110 TGE TGEU TLT TLTU TEQ * TNE * 
7 111} DSLL ‘3 DSRL DSRA DSLL32 . DSRL32 | DSRA32 
31 26 20 16 0 
an el 
= REGIMM 
rt bits 18..16 Instructions encoded by the rt field when opcode field = REGIMM. 
bits 0 1 2 3 4 5 6 7 
20..19 000 001 010 011 100 101 110 111 
0 00 BLTZ BGEZ BLTZL BGEZL i - ‘ - 
1 Ol TGEI TGEIU TLTI TLTIU TEQI a TNEI ‘i 
2 10 | BLTZAL | BGEZAL | BLTZALL | BGEZALL 7 s = - 
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Table 1-45 CPU Instruction Encoding - MIPS IV Architecture 


31 26 


opcode | bits 28..26 Instructions encoded by opcode field. 
bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 O11 100 101 110 111 
0 000 | SPECIAL &|REGIMM 8 J JAL BEQ BNE BLEZ BGTZ 
1 001 ADDI ADDIU SLTI SLTIU ANDI ORI XORI LUI 
2 010] COPO 8x | COPI 8x | COP2 82 |COPIX 52} BEQL BNEL BLEZL BGTZL 
3 O11) DADDI | DADDIU LDL LDR * * * 
4 100 LB LH LWL LW LBU LHU LWR LWU 
5 101 SB SH SWL SW SDL SDR SWR p 
6 110 LL LWClI x | LWC2x PREF LLD LDC1 x LDC2 x LD 
7 111 Sc SWClx | SWC22 * SCD SDC1 x SDC2 x SD 
31 26 
= SPECIAL 
function| bits 2..0 Instructions encoded by function field when opcode field = SPECIAL. 
bits 0 1 2 3 4 5 6 i 
5..3 000 001 010 O11 100 101 110 111 
0 000 SLL MOVCTI 8, SRL SRA SLLV * SRLV SRAV 
1 001 JR JALR MOVZ MOVN |SYSCALL| BREAK 7% SYNC 
2 010} MFHI MTHI MFLO MTLO DSLLV = DSRLV DSRAV 
3 O11) MULT MULTU DIV DIVU DMULT | DMULTU| DDIV DDIVU 
4 100 ADD ADDU SUB SUBU AND OR XOR NOR 
5 101 * ‘a SLT SLTU DADD DADDU DSUB DSUBU 
6 110 TGE TGEU TLT TLTU TEQ * TNE * 
7 ill DSLL ~ DSRL DSRA DSLL32 . DSRL32 | DSRA32 
31 26 20 16 
opcode [ «| 
= REGIMM 
| rt | bits 18..16 Instructions encoded by the rt field when opcode field = REGIMM. 
bits 0 1 2 3 4 5 6 7 
20..19 000 001 010 O11 100 101 110 111 
0 00 BLTZ BGEZ BLTZL BGEZL * sa = * 
1 Ol TGEI TGEIU TLTI TLTIU TEQI * TNEI * 
2 10 | BLTZAL | BGEZAL | BLTZALL | BGEZALL * a * * 
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Table 1-46 Architecture Level in Which CPU Instructions are Defined or Extended 


The architecture level in which each MIPS IVencoding was defined is indicated by a subscript 1, 2, 3, or 4 (for 
architecture level I, II, III, or IV). If an instruction or instruction class was later extended, the extending level 
is indicated after the defining level. 


31 


26 0 


opcode | bits 28..26 Instructions encoded by opcode field. 
bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 011 100 101 110 111 
0 000 | SPECIAL ,_,| REGIMM , J, JAL , BEQ, BNE BLEZ , BGTZ, 
1 001}; ADDI, | ADDIU, SLTI, SLTIU , ANDI, ORI , XORI , LUI, 
2 010} COPO, |COP1,534| COP2, | COPIX, | BEQL, | BNEL, | BLEZL, | BGTZL, 
3 011 | DADDI3 | DADDIU;) LDL; LDR 3 * * * *y 
4 100 LB LH, LWL LW, LBU | LHU ; LWR , LWU 3 
5 101 SB, SH, SWL | SW, SDL 3 SDR 3 SWR, Po 
6 110 LL» LWC1 , LWC2 ; PREF 4 LLD 3 LDC1 4 LDC2 5 LD3 
7 (iil SC SWC1 , SWC2 ; *3 SCD 3 SDC1 5 SDC2 5 SD 3 
31 26 5 0 
= SPECIAL 
function| bits 2..0 Instructions encoded by function field when opcode field = SPECIAL. 
bits 0 1 2 3 4 5 6 7 
5.3 000 001 010 011 100 101 110 111 
0 000; SLL, MOVCI 4 SRL; SRA | SLLV , * SRLV | SRAV | 
1 O01 JR, JALR ; MOVZ, | MOVN, | SYSCALL, | BREAK , * SYNC 5 
2 010} MFHI, MTHI , MFLO, | MTLO, | DSLLV; * DSRLV 3 | DSRAV 3 
3 O11) MULT, | MULTU, DIV ; DIVU, | DMULT3)DMULTU3] DDIV3 | DDIVU3 
4 100} ADD, ADDU , SUB , SUBU , AND ; OR ; XOR NOR 
5 101 ey * SLT ; SLTU ; DADD 3 | DADDU3 | DSUB3 | DSUBU3 
6 110} TGE, TGEU 4 TLT 5 TLTU > TEQ > * TNE > * 
7 111} DSLL; * DSRL 3 DSRA 3 | DSLL32 3 * DSRL32 3 | DSRA32 3 
31 26 20 16 0 
| Ca 
= REGIMM 
rt bits 18..16 Instructions encoded by the rt field when opcode field = REGIMM. 
bits 0 1 2 3 4 5 6 7 
20..19 000 001 010 011 100 101 110 111 
0 00} BLTZ, BGEZ, | BLTZL, | BGEZL, * * * * 
1 Ol TGEI, | TGEIU > TLTI 5 TLTIU 5 TEQI 4 * TNEI 4 * 
2 10 | BLTZAL, |BGEZAL ;|BLTZALL 5] BGEZALL * * * * 
2 
304 a *y ai iT a *y ey 4 
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Table 1-47 CPU Instruction Encoding Changes - MIPS II Revision 


31 26 


An instruction encoding is shown if the instruction is added in this revision. 


opcode | bits 28..26 Instructions encoded by opcode field. 
bits 0 1 2 3 4 § 6 7 
31..29 000 001 010 O11 100 101 110 111 
0 000 
1 001 
2 010 BEQL BNEL BLEZL BGTZL 
3 O11 
4 100 
5 101 r 
6 110 LL LDC1 x LDC2 x LDC3 x 
7 iil SC SDC1 x SDC2 x SDC3 x 
31 26 
= SPECIAL 
function| bits 2..0 Instructions encoded by function field when opcode field = SPECIAL. 
bits 0 1 2 3 4 5 6 7 
5.3 000 001 010 O11 100 101 110 111 
0 000 
1 001 SYNC 
2 010 
3 O11 
4 100 
5 101 
6 110 TGE TGEU TLT TLTU TEQ TNE 
7 iil 
31 26 20 16 
opcode [ «| 
= REGIMM 
rt bits 18..16 Instructions encoded by the rt field when opcode field = REGIMM. 
bits 0 1 2 3 4 5 6 7 
20..19 000 001 010 011 100 101 110 111 
0 00 BLTZL BGEZL 
1 Ol TGEI TGEIU TLTI TLTIU TEQI TNEI 
2 10 BLTZALL | BGEZALL 
3 ll 
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Table 1-48 CPU Instruction Encoding Changes - MIPS III Revision 


31 26 0 
An instruction encoding is shown if the instruction is added or modified in this revision. 
opcode | bits 28..26 Instructions encoded by opcode field. 
bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 O11 100 101 110 111 
0 000 
1 001 
2 010 * 
(was COP3) 
3 011) DADDI | DADDIU LDL LDR 
4 100 LWU 
5 101 SDL SDR 
6 110 fe LLD LD 
(was LWC3) (was LDC3) 
7 111 - SCD SD 
(was SWC3) (was SDC3) 
31 26 0 
= SPECIAL 
function| bits 2..0 Instructions encoded by function field when opcode field = SPECIAL. 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 
1 001 
2 010 DSLLV DSRLV DSRAV 
3 O11 DMULT |DMULTU| DDIV DDIVU 
4 100 
5 101 DADD | DADDU | DSUB DSUBU 
6 110 
7 111) DSLL DSRL DSRA DSLL32 DSRL32 | DSRA32 
31 26 20 16 0 
si C=] 
= REGIMM 
rt bits 18..16 Instructions encoded by the rt field when opcode field = REGIMM. 
bits 0 1 2 3 4 5 6 7 
20..19 000 001 010 O11 100 101 110 111 
0 00 
1 Ol 
2 10 
3 11 
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Table 1-49 CPU Instruction Encoding Changes - MIPS IV Revision 
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31 26 0 
An instruction encoding is shown if the instruction is added or modified in this revision. 
bits 28..26 Instructions encoded by opcode field. 
bits 0 1 2 3 4 re) 6 7 
31..29 000 001 010 O11 100 101 110 111 
0 000 
1 001 
2 010 COPIX 8x 
3 O11 
4 100 
5 101 
6 110 PREF 
7 (iil 
31 26 5 0 
= SPECIAL 
bits 2..0 Instructions encoded by function field when opcode field = SPECIAL. 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 MOVCI 8, 
1 001 MOVZ MOVN 
2 010 
3 O11 
4 100 
5 101 
6 110 
7 iil 
31 26 20 16 0 
opcode [ «| 
= REGIMM 
bits 18..16 Instructions encoded by the rt field when opcode field = REGIMM. 
bits 0 1 2 3 4 5 6 7 
20..19 000 001 010 011 100 101 110 111 
0 00 
1 Ol 
2 10 
3 ll 
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Key to notes in CPU instruction encoding tables: 


ok 


This opcode is reserved for future use. An attempt to execute it causes a Reserved 
Instruction exception. 


This opcode is reserved for future use. An attempt to execute it produces an 
undefined result. The result may be a Reserved Instruction exception but this is 
not guaranteed. 


(also italic opcode name) This opcode indicates an instruction class. The 
instruction word must be further decoded by examing additional tables that show 
values for another instruction field. 


This opcode is a coprocessor operation, not a CPU operation. If the processor 
state does not allow access to the specified coprocessor, the instruction causes a 
Coprocessor Unusable exception. It is included in the table because it uses a 
primary opcode in the instruction encoding map. 


This opcode is removed in a later revision of the architecture. If a MIPS HI or 
MIPS IV processor is operated in MIPS II-only mode this opcode will cause a 
Reserved Instruction exception. 


This opcode indicates a class of coprocessor | instructions. If the processor state 
does not allow access to coprocessor 1, the opcode causes a Coprocessor Unusable 
exception. It is included in the table because the encoding uses a location in what 
is otherwise a CPU instruction encoding map. Further encoding information for 
this instruction class is in the FPU Instruction Encoding tables. 


This opcode is reserved for Coprocessor 0 (System Control Coprocessor) 
instructions that require base+offset addressing. If the instruction is used for 
COPO in an implementation, an attempt to execute it without Coprocessor 0 access 
privilege will cause a Coprocessor Unusable exception. If the instruction is not 
used in an implementation, it will cause a Reserved Instruction exception. 
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FPU Instruction Set 


2.1 Introduction 


This chapter describes the instruction set architecture (ISA) for the floating-point unit 
(FPU) in the MIPS IV architecture. In the MIPS architecture, the FPU is coprocessor I, an 
optional processor implementing IEEE Standard 7541 floating-point operations. The FPU 
also provides a few additional operations not defined by the IEEE standard. 


The original MIPS I FPU ISA has been 
extended in a backward-compatible fashion 
three times. The ISA extensions are inclusive 
as the diagram illustrates; each new 
architecture level (or version) includes the 
former levels. The description of an 
architectural feature includes the architecture 
level in which the feature is (first) defined or 
extended. The feature is also available in all 
later (higher) levels of the architecture. 


MIPS II 
MIPS III 


MIPS IV 


MIPS Architecture Extensions 


+ IEEE Standard 754-1985, “IEEE Standard for Binary Floating-Point Arithmetic” 
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In addition to an ISA, the architecture definition includes processing resources, such as the 
coprocessor general register set. The 32-bit registers in MIPS I were changed to 64-bit 
registers in MIPS II in a way that is not backwards compatible. For changes such as this, 
processors implementing higher levels of the architecture have a way to provide the 
processing resource model for earlier levels. For the FPU there is a mode to select the 
32-bit or 64-bit register model. The practical result is that a processor implementing MIPS 
IV is also able to run MIPS I, MIPS IL, or MIPS III binary programs without change. 


If coprocessor | is not enabled, an attempt to execute a floating-point instruction will cause 
a Coprocessor Unusable exception. Enabling coprocessor | is a privileged operation 

provided by the System Control Coprocessor. Every system environment will either enable 
the FPU automatically or provide a means for an application to request that it be enabled. 


Before the instruction set is described, there is an overview of the FPU data types, registers, 
and computational model. The FPU instruction set is summarized by functional group then 
each operation is described separately in alphabetical order. The description concludes 
with the FPU instruction formats and opcode encoding tables. See 1.7 Description of an 
Instruction for a description of the organization of the individual instruction descriptions 
and the notation used in them. 


The architecture of the floating-point coprocessor consists of: 
e« Data types 
¢ Operations 
e A computational model 
e Processing resources (registers) 


e An instruction set 


The IEEE standard defines the floating-point number data types, the basic arithmetic, 
comparison, and conversion operations, and a computational model. 


The IEEE standard defines neither specific processing resources nor an instruction set. The 
MIPS architecture defines fixed-point (integer) data types, FPU register sets, control and 
exception mechanisms, and an instruction set. The architecture include non-[EEE FPU 
control operations, and arithmetic operations (multiply-add, reciprocal, and reciprocal 
square root) that may not supply results that match the IEEE precision rules. 


The FPU provides both floating-point and fixed-point data types. The single and double 
precision floating-point data types are those specified by the IEEE standard. The fixed- 
point types are the signed integers provided by the CPU architecture 
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2.2.1 Floating-Point Formats 


There are two floating-point data types provided by the FPU. 
¢ 32-bit Single precision floating-point (type S) 
¢ 64-bit Double precision floating-point (type D) 
The floating-point formats represents numeric values as well as other special entities: 


1. Numbers of the form: (-1)§ 2 bg. by by ...by4 
where (see Table 2-1): 


s=Oorl 
E =any integer between E_min and E_max, inclusive 
b; = 0 or 1 (the high bit, bo, is to the left of the binary point) 
p is the precision 
2. Two infinities, +co and -co 
Signaling non-numbers (SNaNs) 
4. Quiet non-numbers (QNaNs) 


Table 2-1 Parameters of Floating-Point Formats 


parameter Single Double 
bits of mantissa precision, p 24 53 
maximum exponent, E_max +127 +1023 
minimum exponent, E_min -126 -1022 
exponent bias +127 +1023 
bits in exponent field, e 8 11 
representation of bg integer bit hidden hidden 
bits in fraction field, f 23 52 
total format width in bits 32 64 


The single and double floating-point formats are composed of three fields whose size is 
listed in Table 2-1. The layouts are pictured in the figures below. 


« A 1-bit sign, s. 
e A biased exponent, e = E + bias 


* A binary fraction, f= .b; by ...b,., (the bg bit is not recorded) 


31 30 23 (22 0 


sign} exponent fraction 


1 8 23 


Figure 2-1 Single-Precision Floating-Point Format (S) 
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63 62 52; SI 0 


sign exponent fraction 


1 11 52 


Figure 2-2. Double-Precision Floating-Point Format (D) 


Values are encoded in the formats using the unbiased exponent, fraction, and sign values 
shown in Table 2-2. The high-order bit of the fraction field, identified as b;, is also 
important for NaNs. 


Table 2-2 Value of Single or Double Floating-Point Format Encoding 


unbiased E f s b, value v type of value 
1 SNaN Signaling NaN 
E_max +1 #0 
0 QNaN Quiet NaN 
1 - co minus infinity 
E_max +1 0 
0 + 00 plus infinity 
E_max 1 - (27) 1.f negative normalized number 
to E. ae - 
min 0 + Q*).f/) positive normalized number 
1 = (2Eminy(g, 7) negative denormalized number 
E_min-1 #0 . 
0 + (2Eminy(g, f) positive denormalized number 
1 -0 negative zero 
E_min -1 0 
0 +0 positive zero 


(1) Normalized and Denormalized Numbers 


For single and double formats, each representable nonzero numerical value has just one 
encoding; numbers are kept in normalized form. The high-order bit of the p-bit mantissa, 
which lies to the left of the binary point, is “hidden”, and not recorded in the fraction field. 
The encoding rules permit the value of this bit to be determined by looking at the value of 
the exponent. When the unbiased exponent is in the range E_min to E_max, inclusive, the 
number is normalized and the hidden bit must be 1. If the numeric value cannot be 
normalized because the exponent would be less than E_min, then the representation is 
denormalized and the encoded number has an exponent of E_min-1 and the hidden bit has 
the value 0. Plus and minus zero are special cases that are not regarded as denormalized 
values. 


(2) Reserved Operand Values — Infinity and NaN 


A floating-point operation can signal IEEE exception conditions, such as those caused by 
uninitialized variables, violations of mathematical rules, or results that cannot be 
represented. If a program does not choose to trap IEEE exception conditions, a 
computation that encounters these conditions proceeds without trapping but generates a 
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result indicating that an exceptional condition arose during the computation. To permit 
this, each floating-point format defines representations, shown in Table 2-2, for +infinity 
(+c0), -infinity (-co), quiet NaN (QNan), and signaling NaN (SNaN). 


Infinity represents a number with magnitude too large to be represented in the format; in 
essence it exists to represent a magnitude overflow during a computation. A correctly 
signed oo is generated as the default result in division by zero and some cases of overflow; 
details are in the IEEE exception condition descriptions and Table 2-4 "Default Result for 
IEEE Exceptions Not Trapped Precisely”. 


Once created as a default result, °c can become an operand in a subsequent operation. The 
infinities are interpreted such that -o < (every finite number) < +0, Arithmetic with ~ is 
the limiting case of real arithmetic with operands of arbitrarily large magnitude, when such 
limits exist. In these cases, arithmetic on - is regarded as exact and exception conditions 
do not arise. The out-of-range indication represented by the © is propagated through 
subsequent computations. For some cases there is no meaningful limiting case in real 
arithmetic for operands of oo and these cases raise the Invalid Operation exception 
condition. See the description of the Invalid Operation exception for a list of these cases. 


SNaN operands cause the Invalid Operation exception for arithmetic operations. SNaNs 
are useful values to put uninitialized variables. SNaN is never produced as a result value. 


NOTE: The IEEE 754 Standard states that “Whether copying a signaling NaN 
without a change of format signals the invalid operation exception is the 
implementor’s option”. The MIPS architecture has chosen to make the formatted 
operand move instructions (MOV fmt MOVT fmt MOVE fmt MOVN fmt MOVZ fmt) 
non-arithmetic and they do not signal IEEE exceptions. 


QNaNs are intended to afford retrospective diagnostic information inherited from invalid 
or unavailable data and results. Propagation of the diagnostic information requires that 
information contained in the QNaNs be preserved through arithmetic operations and 
floating-point format conversions. 


QNaN operands do not cause arithmetic operations to signal an exception. When a floating- 
point result is to be delivered, a QNaN operand causes an arithmetic operation to supply a 
QNaN result. The result QNaN is one of the operand QNaN values when possible. QNaNs 
do have effects similar to SNaNs on operations that do not deliver a floating-point result, 
specifically comparisons. See the detailed description of the floating-point compare 
instruction (C.cond.fmt) for information. 


When certain invalid operations not involving QNaN operands are performed but do not 
cause a trap (because the trap is not enabled), a new QNaN value is created. Table 2-3 
shows the QNaN value generated when no input operand QNaN value can be copied. The 
values listed for the fixed-point formats are the values supplied to satisfy the IEEE standard 
when a QNaN or infinite floating-point value is converted to fixed point. There is no other 
feature of the architecture that detects or makes use of these “integer QNaN” values. 


227 


2.2.2 Fixed-Point Formats 


Chapter 2 FPU Instruction Set 


Table 2-3 Value Supplied when a new Quiet NaN is Created 


Format New QNaN value 

Single floating point 7fbf ffff 
Double floating point 7££7 f£LffF FLLL fLff 
Word fixed point 7££f£ ffff 
Longword fixed point 7£££ fFLLFLF FLLFL ffLff 


There are two floating-point data types provided by the FPU. 
¢ 32-bit Word fixed-point (type W) 
¢ 64-bit Longword fixed-point (type L) (defined in MIPS IT) 


The fixed-point values are held in the two’s complement format used for signed integers in 
the CPU. Unsigned fixed-point data types are not provided in the architecture; application 
software may synthesize computations for unsigned integers from the existing instructions 
and data types. 


31 30 0 


sign int 


1 31 


Figure 2-3 Word Fixed-Point Format (W) 


63 62 0 


sign int 


1 63 


Figure 2-4 Longword Fixed-Point Format (L) 


2.3 Floating-Point Registers 
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This section describes the organization and use of the two separate coprocessor | (CP1) 
register sets. The coprocessor general registers, also called Floating General Registers 
(FGRs) are used to transfer binary data between the FPU and the rest of the system. The 
general register set is also used to hold formatted FPU operand values. There are only two 
control registers and they are used to identify and control the FPU. 


There are separate 32-bit and 64-bit wide register models. MIPS I defines the 32-bit wide 
register model. MIPS III defines the 64-bit model. To support programs for earlier 
architecture definitions, processors providing the 64-bit MIPS III register model also 
provide the 32-bit wide register model as a mode selection. Selecting 32 or 64-bit register 
model is an implementation-specific privileged operation. 


2.3.1 Organization 
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The CP1 register organization for 32-bit and 64-bit register models is shown in Figure 2-5. 
The coprocessor general registers are the same width as the CPU registers. The two defined 


control registers are 32-bits wide. 


MIPS I MIPS III 
32-bit reg model 64-bit register model 
31 0 63 0 

reg # 0 0 

1 1 

2 2 

3 3 

30 30 

31 31 


FPU - Control Registers (FCRs) 


31 0 
Implementation and Revision # 
#0 0 


| FP Control and Status 
31 


31 


Figure 2-5. Coprocessor I General Registers (FGRs) 


2.3.2 Binary Data Transfers 


The data transfer instructions move words and doublewords between the CP1 general 
registers and the remainder of the system. The operation of the load and move-to 
instructions is shown in Figure 2-6 and Figure 2-7. The store and move-from instructions 
operate in reverse, reading data from the location that the corresponding load or move-to 


instruction wrote it. 
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MIPS I MIPS III 
32-bit reg model operation 64-bit register model 
31 0 63 0 
#0 empty #0 empty 
1 empty 1 empty 
J LWC1 £0,0(r0) / MTC1 £0,r0 | 
0 data word 0 0 | undefined/unused data word 0 
empty empty 
| LWCl £1,4(r0) / MTC1 £1,r4 | 


0 data word 0 


data word 4 


Figure 2-6 Effect of FPU Word Load or Move-to Operations 


0 | undefined/unused | data word 0 


undefined/unused | data word 4 


Doubleword transfers to/from 32-bit registers access an aligned pair of CP1 general 
registers with the least-significant word of the doubleword in the lowest-numbered register. 


MIPS II 
32-bit reg model MIPS III 
Loads/Stores operation 64-bit register model 
(see note below) 
31 0 63 0 
#0 empty #0 empty 
1 empty 1 empty 
| LDC1 £0,0(r0) / DMTC1 £0,r0 J 
0 lower word (0) 0 data doubleword 0 
upper word (4) empty 
| LDC1 £1,8(r0) / DMTC1 £1,r8 {J 
Pe area’ dl 0 data doubleword 0 


invalid to load 
double to odd 
register 


Wiseceeee eee eel ela 


data doubleword 8 


NOTE: No 64-bit transfers are defined for the MIPS I 32-bit register model. 
MIPS II defines the 64-bit loads/stores but not 64-bit moves. 


Figure 2-7 Effect of FPU Doubleword Load or Move-to Operations 
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2.3.3 Formatted Operand Layout 


FPU instructions that operate on formatted operand values specify the floating-point 
register (FPR) that holds a value. An FPR is not necessarily the same as a CP! general 
register because an FPR is 64 bits wide; if this is wider than the CP1 general registers, an 
aligned set of adjacent CP1 general registers is used as the FPR. The 32-bit register model 
provides 16 FPRs specified by the even CP1 general register numbers. The 64-bit register 
model provides 32 FPRs, one per CP1 general register. Operands that are only 32 bits wide 
(W and S formats), use only half the space in an FPR. The FPR organization and the way 
that operand data is stored in them is shown in the following figures. A summary of the 
data transfer instructions can be found in 2.6.1 Data Transfer Instructions. 


MIPS I MIPS II 
32-bit reg model 64-bit register model 
#0 
#0 
1 
2 
2 
3 
30 
30 
31 
Pate es 32 x 64-bit 
(FPRs) operand registers (FPRs) 


Figure 2-8 Floating-point Operand Register (FPR) Organization 


MIPS I MIPS II 
32-bit reg model 64-bit register model 
31 0 63 0 
#0 data word #0 | undefined/unused data word 
undefined/unused 1 | empty — available to hold an operand 


Figure 2-9 Single Floating Point (S) or Word Fixed (W) Operand in an FPR 
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MIPS I 
32-bit reg model 
(see note below) 


MIPS III 
64-bit register model 


31 0 63 0 
#0 lower word #0 data doubleword 


upper word 1 | empty — available to hold an operand 


NOTE: MIPS I supports the Double floating-point (D) type; the fixed-point longword (L) 
operand is available starting in MIPS III 


Figure 2-10 Double Floating Point (D) or Long Fixed (L) Operand in an FPR 


2.3.4 Implementation and Revision Register 


Coprocessor control register 0 contains values that identify the implementation and 
revision of the FPU. Only the low-order two bytes of this register are defined as shown in 
Figure 2-11. 


32 16 15 8 7 10) 


0 Implementation Revision 


16 8 8 


Figure 2-11 FPU Implementation and Revision Register 


The implementation field identifies a particular FPU part, but the revision number may not 
be relied on to reliably characterize the FPU functional version. 


2.3.5 FPU Control and Status Register — FCSR 
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Coprocessor control register 31 Is the FPU Control and Status Register (FCSR). Access to 
the register is not privileged; it can be read or written by any program that can execute 
floating-point instructions. It controls some operations of the coprocessor and shows status 
information: 


e Selects the default rounding mode for FPU arithmetic operations. 

¢ Selectively enables traps of FPU exception conditions. 

¢ Controls some denormalized number handling options. 

¢ Reports IEEE exceptions that arose in the most recently executed instruction. 
e Reports IEEE exceptions that arose, cumulatively, in completed instructions. 


e Indicates the condition code result of FP compare instructions. 
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The contents of this register are unpredictable and undefined after a processor reset or a 
power-up event. Software should initialize this register. 


Figure 2-12 MIPS I- FPU Control and Status Register (FCSR) 


31 


24 23 22 18 17 12 11 ih 210 
0 c cause enables flags RM 
8 I 6 5 5 a 
E/V|ZJO|U/1T/V/Z/O/U) 1 Z\O\|U]I 


17 16 15 1413121110 9 8 7 65 4 3 2 


Figure 2-13 MIPS III - FPU Control and Status Register (FCSR) 


31 


25 24 23 22 


18 17 


12 11 7 2 10 
0 FS} c cause enables flags RM 
7 11 6 5 5 ae 
E/V{ZJOJU/L/V/Z/O/U;1T}V/ZjO|U] I 


17 16 15 14131211109 8 765 4 3 2 


Figure 2-14 MIPS IV - FPU Control and Status Register (FCSR) 


31 25 24 23 22 18 17 12, 11 7 6 210 
FCC FS |FCC 0 cause enables flags RM 

7 ee eg 5 6 5 5 ae 

716}5)4/3}2]1 0 E/V/ZJOJU/T/V|ZJO/U}T|V{Z/O};U] I 


31 30 29 28 27 26 25 23 


17 16 15 1413 1211109 8 7 65 4 3 2 


All fields in the FCSR are readable and writable. 


FCC Floating-Point Condition Codes. These bits record the result of FP compares and are 


tested for FP conditional branches; the FCC bit to use is specified in the compare or 
branch instruction. The 0" FCC bit is the same as the c bit in MIPS I. 


FS Flush to Zero. When FS is set, denormalized results are flushed to zero instead of 


causing an unimplemented operation exception. When a denormalized operand 


value is encountered, zero may be used instead of the denorm; this is implementation 
specific. 


c Condition Bit. This bit records the result of FP compares and is tested by FP 
conditional branches. In MIPS IV this becomes the 0" FCC bit. 
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cause Cause bits. 
These bits indicate the exception conditions that arise during the execution of an FPU 
arithmetic instruction in precise exception mode. A bit is set to 1 if the corresponding 
exception condition arises during the execution of an instruction and 0 otherwise. By 
reading the registers, the exception conditions caused by the preceding FPU 
arithmetic instruction can be determined. The meaning of the individual bits is: 


Unimplemented Operation 
Invalid Operation 

Divide by Zero 

Overflow 


G ON < & 


Underflow 


+ 


Inexact Result 


enables Enable bits (see cause field for bit names). 
These bits control, for each of the five conditions individually, whether a trap is taken 
when the IEEE exception condition occurs. The trap occurs when both an enable bit 
and the corresponding cause bit are set during an FPU arithmetic operation or by 
moving a value to the FCSR. The meaning of the individual bits is the same as the 
cause bits. Note that the “E” cause bit has no corresponding enable bit; the non-IEEE 
Unimplemented Operation exception defined by MIPS is always enabled. 


flags Flag bits. (see cause field for bit names) 
This field shows the exception conditions that have occurred for completed 
instructions since it was last reset. For a completed FPU arithmetic operation that 
raises an exception condition the corresponding bits in the flag field are set and the 
others are unchanged. This field is never reset by hardware and must be explicitly 
reset by user software. 


RM Rounding Mode. The rounding mode used for most floating-point operations (some 
FP instructions use a specific rounding mode). The rounding modes are: 


0 RN -- Round to Nearest 
Round result to the nearest representable value. When two representable 
values are equally near, round to the value that has a least significant bit of 
zero (i.e. is even). 


1 RZ -- Round toward Zero 
Round result to the value closest to and not greater in magnitude then the 
result. 

2 RP -- Round toward Plus infinity 


Round result to the value closest to and not less than the result. 


3 RM -- Round toward Minus infinity 
Round result to the value closest to and not greater than the result. 
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2.4 Values in FP Registers 


Unlike the CPU, the FPU does not interpret the binary encoding of source operands or 
produce a binary encoding of results for every operation. The value held in a floating-point 
operand register (FPR) has a format, or type and it may only be used by instructions that 
operate on that format. The format of a value is either uninterpreted, unknown, or one of 
the valid numeric formats: single and double floating-point and word and long fixed-point. 
The way that the formatted value in an FPR is set and changed is summarized in the state 
diagram in Figure 2-15 and is discussed below. 


The value in an FPR is always set when a value is written to the register. When a data 
transfer instruction writes binary data into an FPR (a load), the FPR gets a binary value that 
is uninterpreted. A computational or FP register move instruction that produces a result of 
type fmt puts a value of type fmt into the result register. 


When an FPR with an uninterpreted value is used as a source operand by an instruction that 
requires a value of format fmt, the binary contents are interpreted as an encoded value in 
format fmt and the value in the FPR changes to a value of format fmt. The binary contents 
cannot be reinterpreted in a different format. 


If an FPR contains a value of format fmt, a computational instruction must not use the FPR 
as a source operand of a different format. If this occurs, the value in the register becomes 
unknown and the result of the instruction is also a value that is unknown. Using an FPR 
containing an unknown value as a source operand produces a result that has an unknown 
value. 


The format of the value in the FPR is unchanged when it is read by a data transfer 
instruction (a store). A data transfer instruction produces a binary encoding of the value 
contained in the FPR. If the value in the FPR is unknown, the encoded binary value 
produced by the operation is not defined. 
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Value 
uninterpreted 
(binary 
encoding) 


Rslt 
unknown 


Src B 
(interpret) 


Value in 
format 


Rslt 
unknown 


Example formats 

Destination of LWC1, LDC1, MTC1, or DMTC1 instructions. 

Source operand of SWC1, SDC1, MFC1, or DMFC1 instructions. 
Sre fmt: Source operand of computational instruction expecting format “fmr’. 
Rslt fmt: Result of computational instruction producing value of format “fm?’. 


Figure 2-15 The Effect of FPU Operations on the Format of Values Held in FPRs 


2.5 FPU Exceptions 
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The IEEE 754 standard specifies that: 


There are five types of exceptions that shall be signaled when detected. 
The signal entails setting a status flag, taking a trap, or possibly doing 
both. With each exception should be associated a trap under user control, 


This function is implemented in the MIPS FPU architecture with the cause, enable, and flag 
fields of the control and status register. The flag bits implement IEEE exception status 
flags, and the cause and enable bits control exception trapping. Each field has a bit for each 
of the five IEEE exception conditions and the cause field has an additional exception bit, 
Unimplemented Operation, used to trap for software emulation assistance. 


There may be two exception modes for the FPU, precise and imprecise, and the operation 
of the FPU when exception conditions arise depends on the exception mode that is 
currently selected. Every processor is able to operate the FPU in the precise exception 
mode. Some processors also have an imprecise exception mode in which floating-point 
performance is greater. Selecting the exception mode, when there is a choice, is a 
privileged implementation-specific operation. 


2.5.1 Precise Exception Mode 


In precise exception mode, an exception (trap) caused by a floating-point operation is 
precise. A precise trap occurs before the instruction that causes the trap, or any following 
instruction, completes and writes results. If desired, the software trap handler can resume 
execution of the interrupted instruction stream after handling the exception. 


The cause bit field reports per-instruction exception conditions. The cause bits are written 
during each floating-point arithmetic operation to show the exception conditions that arose 
during the operation. The bits are set to 1 if the corresponding exception condition arises 
and 0 otherwise. 


A floating-point trap is generated any time both a cause bit and the corresponding enable 
bit are set. This occurs either during the execution of a floating-point operation or by 
moving a value into the FCSR. There is no enable for Unimplemented Operation; this 
exception condition always generates a trap. 


In a trap handler, the exception conditions that arose during the floating-point operation 
that trapped are reported in the cause field. Before returning from a floating-point interrupt 
or exception, or setting cause bits with a move to the FCSR, software must first clear the 
enabled cause bits by a move to the FCSR to prevent the trap from being retaken. User- 
mode programs can never observe enabled cause bits set. If this information is required in 
a user-mode handler, then it must be passed somewhere other than the status register. 


For a floating-point operation that sets only non-enabled cause bits, no trap occurs and the 
default result defined by the IEEE standard is stored (see Table 2-4). When a floating-point 
operation does not trap, the program can see the exception conditions that arose during the 
operation by reading the cause field. 
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The flag bit field is a cumulative report of IEEE exception conditions that arise during 
instructions that complete; instructions that trap do not update the flag bits. The flag bits 
are set to | if the corresponding IEEE exception is raised and unchanged otherwise. There 
is no flag bit for the MIPS Uniplemented Operation exception condition. The flag bits are 
never cleared as a side effect of floating-point operations, but may be set or cleared by 
moving a new value into the FCSR. 


2.5.2 Imprecise Exception Mode 


In imprecise exception mode, an exception (trap) caused by an IEEE floating-point 
operation is imprecise (Unimplemented Operation exceptions must still be signaled 
precisely). An imprecise trap occurs at some point after the exception condition arises. In 
particular, it does not necessarily occur before the instruction that causes the exception, or 
following instructions, have completed and written results. The software trap handler can 
generally neither determine which instruction caused the trap nor continue execution of the 
interrupted instruction stream; it can record the trap that occurred and abort the program. 


The meaning of the cause bit field when reading the FCSR is not defined. When a cause 
bit is written in the FCSR by moving data to it, the corresponding flag bit is also set. 


All floating-point operations, whether they cause a trap or not, complete in the sense that 
they write a result and record exception condition bits in the flag field. When an IEEE 
exception condition arises during an operation, the default result defined by the IEEE 
standard is stored (see Table 2-4). 


A floating-point trap is generated when an exception condition arises during a floating- 
point operation and the corresponding enable bit is set. A trap will also be generated when 
a value with corresponding cause and enable bits set is moved into the FCSR. There is no 
enable for Unimplemented Operation; this exception condition always generates a trap. 


The flag bit field is a cumulative report of IEEE exception conditions that arise during 
instructions that complete. Because all instructions complete in this mode, unlike precise 
exception mode, the flag bits include exception conditions that cause traps. The flag bits 
are set to | if the corresponding IEEE exception is raised and unchanged otherwise. There 
is no flag bit for the MIPS Uniplemented Operation exception condition. The flag bits are 
never cleared as a side effect of floating-point operations, but may be set or cleared by 
moving a new value into the FCSR. 


2.5.3. Exception Condition Definitions 
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The five exception conditions defined by the IEEE standard are described in this section. 
It also describes the MIPS-defined exception condition, Unimplemented Operation, that is 
used to signal a need for software emulation assistance for an instruction. 


Normally an IEEE arithmetic operation can cause only one exception condition; the only 
case in which two exceptions can occur at the same time are inexact with overflow and 
inexact with underflow. 
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At the program’s direction, an IEEE exception condition can either cause a trap or not. The 
IEEE standard specifies the result to be delivered in case the exception is not enabled and 
no trap is taken. The MIPS architecture supplies these results whenever the exception 
condition does not result in a precise trap (i.e. no trap or an imprecise trap). The default 
action taken depends on the type of exception condition, and in the case of the Overflow, 
the current rounding mode. The default result is mentioned in each description and 
summarized inTable 2-4. 


Table 2-4 Default Result for IEEE Exceptions Not Trapped Precisely 


Bit Description Default Action 


V Invalid Supply a quiet NaN. 
Operation 


Z Divide by Supply a properly signed infinity. 
zero 


U__Underflow Supply a rounded result. 


I Inexact Supply a rounded result. If caused by an overflow without the overflow trap 


enabled, supply the overflowed result. 


O Overflow Depends on the rounding mode as shown below 


0 (RN) Supply an infinity with the sign of the intermediate result. 


1(RZ) Supply the format’s largest finite number with the sign of the intermediate 
result. 


2 (RP) For positive overflow values, supply positive infinity. For negative overflow 
values, supply the format’s most negative finite number. 


3 (RM) for positive overflow values supply the format’s largest finite number. For 
negative overflow values, supply minus infinity. 


(1) Invalid Operation exception 


The invalid operation exception is signaled if one or both of the operands are invalid for the 
operation to be performed. The result, when the exception condition occurs without a 
precise trap, is a quiet NaN. The invalid operations are: 


One or both operands is a signaling NaN (except for the non-arithmetic 
MOV. fint MOVT. fmt MOVE.fmt MOVN  fint and MOVZ, ft instructions) 


Addition or subtraction: magnitude subtraction of infinities, such as: (+00) + 
(-c0) or (-20) - (-00) 

Multiplication: 0 x o, with any signs 

Division: 0/0 or c/o, with any signs 

Square root: An operand less than 0 (-0 is a valid operand value). 


Conversion of a floating-point number to a fixed-point format when an 
overflow, or operand value of infinity or NaN, precludes a faithful 
representation in that format. 


Some comparison operations in which one or both of the operands is a QNaN 
value. The definition of the compare operation (C.cond.fmt) has tables 
showing the comparisons that do and do not signal the exception. 


239 


240 


Chapter 2. FPU Instruction Set 


(2) Division By Zero exception 


(3) Overflow exception 


(4) Underflow exception 


(5) Inexact exception 


The division by zero exception is signaled on an implemented divide operation if the 
divisor is zero and the dividend is a finite nonzero number. The result, when no precise 
trap occurs, is a correctly signed infinity. The divisions (0/0) and (e/0) do not cause the 
division by zero exception. The result of (0/0) is an Invalid Operation exception condition. 
The result of (co/0) is a correctly signed infinity. 


The overflow exception is signaled when what would have been the magnitude of the 
rounded floating-point result, were the exponent range unbounded, is larger than the 
destination format’s largest finite number. The result, when no precise trap occurs, is 
determined by the rounding mode and the sign of the intermediate result as shown in Table 
2-4. 


Two related events contribute to underflow. One is the creation of a tiny non-zero result 
between +2°-""" which, because it is tiny, may cause some other exception later such as 
overflow on division. The other is extraordinary loss of accuracy during the approximation 
of such tiny numbers by denormalized numbers. The IEEE standard permits a choice in 
how these events are detected, but requires that they must be detected the same way for all 
operations. 


The IEEE standard specifies that “tininess” may be detected either: “after rounding” (when 
a nonzero result computed as though the exponent range were unbounded would lie strictly 
between +2°-"""), or “before rounding” (when a nonzero result computed as though both 
the exponent range and the precision were unbounded would lie strictly between +2%miny 
The MIPS architecture specifies that tininess is detected after rounding. 


The IEEE standard specifies that loss of accuracy may be detected as either 
“denormalization loss” (when the delivered result differs from what would have been 
computed if the exponent range were unbounded), or “inexact result’ (when the delivered 
result differs from what would have been computed if both the exponent range and 
precision were unbounded). The MIPS architecture specifies that loss of accuracy is 
detected as inexact result. 


When an underflow trap is not enabled, underflow is signaled only when both tininess and 
loss of accuracy have been detected. The delivered result might be zero, denormalized, or 
+2 in When an underflow trap is enabled (via the FCSR enable field bit), underflow is 
signaled when tininess is detected regardless of loss of accuracy. 


If the rounded result of an operation is not exact or if it overflows without an overflow trap, 
then the inexact exception is signaled. 
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(6) Unimplemented Operation exception 


This MIPS defined (non-IEEE) exception is to provide software emulation support. The 
architecture is designed to permit a combination of hardware and software to fully 
implement the architecture. Operations that are not fully supported in hardware cause an 
Unimplemented Operation exception so that software may perform the operation. There is 
no enable bit for this condition; it always causes a trap. After the appropriate emulation or 
other operation is done in a software exception handler, the original instruction stream can 
be continued. 


2.6 Functional Instruction Groups 


The FPU instructions are divided into the following functional groups: 
e Data Transfer 
e Arithmetic 
¢ Conversion 
¢ Formatted Operand Value Move 
¢ Conditional Branch 


e Miscellaneous 


2.6.1 Data Transfer Instructions 


The FPU has two separate register sets: coprocessor general registers and coprocessor 
control registers. The FPU has a load/store architecture; all computations are done on data 
held in coprocessor general registers. The control registers are used to control FPU 
operation. Data is transferred between registers and the rest of the system with dedicated 
load, store, and move instructions. The transferred data is treated as unformatted binary 
data; no format conversions are performed and, therefore, no IEEE floating-point 
exceptions can occur. 


The supported transfer operations are: 


¢ FPU generalreg <> memory (word/doubleword load/store) 
e FPU generalreg <> CPU general reg (word/doubleword move) 
¢ FPUcontrolreg <> CPU general reg (word move) 


All coprocessor loads and stores operate on naturally-aligned data items. An attempt to 
load or store to an address that is not naturally aligned for the data item will cause an 
Address Error exception. Regardless of byte-numbering order (endianness), the address of 
a word or doubleword is the smallest byte address among the bytes in the object. For a big- 
endian machine this is the most-significant byte; for a little-endian machine this is the least- 
significant byte. 


The FPU has loads and stores using the usual register+offset addressing. For the FPU only, 
there are load and store instructions using register+register addressing. 
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MIPS I specifies that loads are delayed by one instruction and that proper execution must 
be insured by observing an instruction scheduling restriction. The instruction immediately 
following a load into an FPU register Fn must not use Fn as a source register. The time 
between the load instruction and the time the data is available is the “load delay slot”. If 
no useful instruction can be put into the load delay slot, then a null operation (NOP) must 
be inserted. 


In MIPS IL, this instruction scheduling restriction is removed. Programs will execute 
correctly when the loaded data is used by the instruction following the load, but this may 
require extra real cycles. Most processors cannot actually load data quickly enough for 
immediate use and the processor will be forced to wait until the data is available. 
Scheduling load delay slots is desirable for performance reasons even when it is not 
necessary for correctness. 


Table 2-5 FPU Loads and Stores Using Register + Offset Address Mode 


Mnemonic Description Defined in 
LWC1 Load Word to Floating-Point MIPS I 
SWCl1 Store Word to Floating-Point I 
LDC1 Load Doubleword to Floating-Point Il 
SDC1 Store Doubleword to Floating-Point Il 
Table 2-6 FPU Loads and Stores Using Register + Register Address Mode 
Mnemonic Description Defined in 
LWXC1 Load Word Indexed to Floating-Point MIPS IV 
SWXCl1 Store Word Indexed to Floating-Point IV 
LDXC1 Load Doubleword Indexed to Floating-Point IV 
SDXC1 Store Doubleword Indexed to Floating-Point IV 
Table 2-7, FPU Move To/From Instructions 

Mnemonic Description Defined in 
MTC1 Move Word To Floating-Point MIPS I 
MFC1 Move Word From Floating-Point I 
DMTC1 Doubleword Move To Floating-Point Il 
DMFC1 Doubleword Move From Floating-Point I 
CTC1 Move Control Word To Floating-Point I 
CFC1 Move Control Word From Floating-Point I 
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2.6.2 Arithmetic Instructions 


The arithmetic instructions operate on formatted data values. The result of most floating- 
point arithmetic operations meets the IEEE standard specification for accuracy; a result 
which is identical to an infinite-precision result rounded to the specified format, using the 
current rounding mode. The rounded result differs from the exact result by less than one 
unit in the least-significant place (ulp). 


Table 2-8 FPU IEEE Arithmetic Operations 


Mnemonic Description Defined in 

ADD fmt Floating-Point Add MIPS I 
SUB. fmt Floating-Point Subtract I 
MUL fmt Floating-Point Multiply I 
DIV. fmt Floating-Point Divide I 
ABS fit Floating-Point Absolute Value I 
NEG fmt Floating-Point Negate I 
SQRT fmt Floating-Point Square Root I 
C.cond.fmt Floating-Point Compare I, IV 


Two operations, Reciprocal Approximation (RECIP) and Reciprocal Square Root 
Approximation (RSQRT), may be less accurate than the IEEE specification. The result of 
RECIP differs from the exact reciprocal by no more than one ulp. The result of RSQRT 
differs by no more than two ulp. Within these error limits, the result of these instructions 
is implementation specific. 


Table 2-9 FPU Approximate Arithmetic Operations 


Mnemonic Description Defined in 
RECIP fmt Floating-Point Reciprocal Approximation MIPS IV 
RSQRT,fmt Floating-Point Reciprocal Square Root Approximation IV 


There are four compound-operation instructions that perform variations of multiply- 
accumulate: multiply two operands and accumulate to a third operand to produce a result. 
The accuracy of the result depends which of two alternative arithmetic models is used for 
the computation. The unrounded model is more accurate than a pair of IEEE operations 
and the rounded model meets the IEEE specification. 


Table 2-10 FPU Multiply-Accumulate Arithmetic Operations 


Mnemonic Description Defined in 

MADD .fmt Floating-Point Multiply Add MIPS IV 
MSUB. fmt Floating-Point Multiply Subtract IV 
NMADD,fmt _ Floating-Point Negative Multiply Add IV 
NMSUB fit Floating-Point Negative Multiply Subtract IV 
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2.6.3 Conversion Instructions 


There are instructions to perform conversions among the floating-point and fixed-point 
data types. Each instruction converts values from a number of operand formats to a 
particular result format. Some convert instructions use the rounding mode specified in the 
Floating Control and Status Register (FCSR), others specify the rounding mode directly. 


Table 2-1] | FPU Conversion Operations Using the FCSR Rounding Mode 


Mnemonic Description Defined in 

CVTS fmt Floating-Point Convert to Single Floating-Point MIPS I, HI 
CVT.D.fmt Floating-Point Convert to Double Floating-Point I, Ul 
CVT.W.fmt Floating-Point Convert to Word Fixed-Point I 
CVT.L fmt Floating-Point Convert to Long Fixed-Point Il 


Table 2-12. FPU Conversion Operations Using a Directed Rounding Mode 


Mnemonic Description Defined in 

ROUND.W.,fmt Floating-Point Round to Word Fixed-Point I 
ROUND.L fmt Floating-Point Round to Long Fixed-Point Il 
TRUNC.W. fmt Floating-Point Truncate to Word Fixed-Point Il 
TRUNC.L fmt Floating-Point Truncate to Long Fixed-Point Il 
CEIL.W.fmt Floating-Point Ceiling to Word Fixed-Point I 
CEIL.L fmt Floating-Point Ceiling to Long Fixed-Point Il 
FLOOR. W, fmt Floating-Point Floor to Word Fixed-Point Il 
FLOOR.L fmt Floating-Point Floor to Long Fixed-Point Il 


2.6.4 Formatted Operand Value Move Instructions 
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These instructions all move formatted operand values among FPU general registers. A 
particular operand type must be moved by the instruction that handles that type. There are 
three kinds of move instructions: 


¢ Unconditional move 
¢ Conditional move that tests an FPU condition code 


¢ Conditional move that tests a CPU general register value against zero 


The conditional move instructions operate in a way that may be unexpected. They always 
force the value in the destination register to become a value of the format specified in the 
instruction. If the destination register does not contain an operand of the specified format 
before the conditional move is executed, the contents become undefined. There is more 
information in 2.4 Values in FP Registers and in the individual descriptions of the 
conditional move instructions themselves. 


Table 2-13 FPU Formatted Operand Move Instructions 


Mnemonic Description Defined in 
MOV. fmt Floating-Point Move MIPS I 
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Table 2-14. FPU Conditional Move on True/False Instructions 


Mnemonic Description Defined in 
MOVT.fmt — Floating-Point Move Conditional on FP True MIPS IV 
MOVF fmt Floating-Point Move Conditional on FP False IV 


Table 2-15 FPU Conditional Move on Zero/Nonzero Instructions 


Mnemonic Description Defined in 
MOVZ.fmt Floating-Point Move Conditional on Zero MIPS IV 
MOVN. fmt Floating-Point Move Conditional on Nonzero IV 


2.6.5 Conditional Branch Instructions 


The FPU has PC-relative conditional branch instructions that test condition codes set by 
FPU compare instructions (C.cond fmt). 


All branches have an architectural delay of one instruction. When a branch is taken, the 
instruction immediately following the branch instruction, in the branch delay slot, is 
executed before the branch to the target instruction takes place. Conditional branches come 
in two versions that treat the instruction in the delay slot differently when the branch is not 
taken and execution falls through. The “branch” instructions execute the instruction in the 
delay slot, but the “branch likely” instructions do not (they are said to nullify it). 


MIPS I defines a single condition code which is implicit in the compare and branch 
instructions. MIPS IV defines seven additional condition codes and includes the condition 
code number in the compare and branch instructions. The MIPS IV extension keeps the 
original condition bit as condition code zero and the extended encoding is compatible with 
the MIPS I encoding. 


Table 2-16 FPU Conditional Branch Instructions 


Mnemonic Description Defined in 

BCIT Branch on FP True MIPS I, IV 
BCIF Branch on FP False LIV 
BCITL Branch on FP True Likely IL IV 
BCIFL Branch on FP False Likely IL IV 


2.6.6 Miscellaneous Instructions 


(1) CPU Conditional Move 


There are instructions to move conditionally move one CPU general register to another 
based on an FPU condition code. 


Table 2-17 CPU Conditional Move on FPU True/False Instructions 


Mnemonic Description Defined in 
MOVZ Move Conditional on FP True MIPS IV 
MOVN Move Conditional on FP False IV 
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2.7 Valid Operands for FP Instructions 
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The floating-point unit arithmetic, conversion, and operand move instructions operate on 
formatted values with different precision and range limits and produce formatted values for 
results. Each representable value in each format has a binary encoding that is read from or 
stored to memory. The fmt or fmt3 field of the instruction encodes the operand format 
required for the instruction. A conversion instruction specifies the result type in the 
function field; the result of other operations is the same format as the operands. The 
encoding of the fmt and fmt3 fields is shown in Table 2-18. 


Table 2-18 FPU Operand Format Field (fmt, fmt3) Decoding 


i Size 
fmt fmt3 Mica age | ore data type 
0-15 = Reserved 
16 0 S single 32 floating-point 
17 1 D double 64 floating-point 
18-19 2-3 Reserved 
20 4 WwW word 32 fixed-point 
21 5 L long 64 fixed-point 
22-31 6-7 Reserved 


Each type of arithmetic or conversion instruction is valid for operands of selected formats. 
A summary of the computational and operand move instructions and the formats valid for 
each of them is listed in Table 2-19. Implementations must support combinations that are 
valid either directly in hardware or through emulation in an exception handler. 


The result of an instruction using operand formats marked “U” is not currently specified by 
this architecture and will cause an exception. They are available for future extensions to 
the architecture. The exact exception mechanism used is processor specific. Most 
implementations report this as an Unimplemented Operation for a Floating Point exception. 
Other implementations report these combinations as Reserved Instruction exceptions. 


66g 
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The result of an instruction using operand formats marked 
execute such an instruction has an undefined result. 


are invalid and an attempt to 
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Table 2-19 Valid Formats for FPU Operations 


operand fmt | COP1 |COP1X 
Mnemonic Operation float | fixed |function| op4 
S|D|W|L| value | value 
ABS Absolute value e e UU 5 
ADD Add e ef UU 0 
C.cond Floating-point compare e e U U- 48-63 
CEIL.L, Convert to longword fixed-point, ° 6 i i 10(14) 
(CEIL.W) round toward +c 
CVT.D Convert to double floating-point ee i ee 33 
CVT.L Convert to longword fixed-point ee i i 37 
CVT.S Convert to single floating-point ieee 32 
CVT.W Convert to 32-bit fixed-point ee i i 36 
DIV Divide e ee UU 3 


FLOOR.L, Convert to longword fixed-point, 


(FLOOR.W) round toward -co es Bede ATS) 


MADD Multiply-Add ee e« UU 4 
MOV Move Register ee i i 6 
MOVC FP Move Conditional on condition ee i i 17 
MOVN FP Move Conditionalon GPR¥ zero *® ¢® i i 19 
MOVZ FP Move ConditionalonGPR=zero ¢ ¢ i i 18 
MSUB Multiply-Subtract e e UU 5 
MUL Multiply e ef UU 2 
NEG Negate e e« UU 7 
NMADD Negative multiply-Add e e UU 6 
NMSUB Negative multiply-Subtract e e UU 7 
RECIP Reciprocal approximation e e UU 21 


ROUND.L, Convert to longword fixed-point, 
(ROUND.W) round to nearest/even 


e 

e 

= 
ioe) 
— 
a 
N 
YS 


RSORT Reciprocal square root . oe ar oo) 
approximation 

SORT Square root e e UU 4 

SUB Subtract e e UU 1 


TRUNC.L Convert to longword fixed-point, 
(TRUNC.W) | round toward zero 


Key: e¢—Valid. _U-—Unimplemented or Reserved. i— Invalid. 


°* © i i 9(13) 


2.8 Description of an Instruction 


For the FPU instruction detail documentation, all variable subfields in an instruction format 
(such as fs, ft, immediate, and so on) are shown in lower-case. The instruction name (such 
as ADD, SUB, and so on) is shown in upper-case. 


For the sake of clarity, we sometimes use an alias for a variable subfield in the formats of 
specific instructions. For example, we use rs = base in the format for load and store 
instructions. Such an alias is always lower case, since it refers to a variable subfield. 


In some instructions, the instruction subfields op and function can have constant 6-bit 
values. When reference is made to these instructions, upper-case mnemonics are used. For 
instance, in the floating-point ADD instruction we use op = COP! and function = ADD. In 
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other cases, a single field has both fixed and variable subfields, so the name contains both 
upper and lower case characters. Bit encodings for mnemonics are shown at the end of this 
section, and are also included with each individual instruction. 


2.9 Operation Notation Conventions and Functions 


The instruction description includes an Operation section that describes the operation of 
the instruction in a pseudocode. The pseudocode and terms used in the description are 
described in 1.8 Operation Section Notation and Functions. 


2.10 Individual FPU Instruction Descriptions 


The FP instructions are described in alphabetic order. See 1.7 Description of an 
Instruction for a description of the information in each instruction description. 
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Floating-Point Absolute Value 


ABS.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd ABS 
010001 00000 000101 

6 5 5 5 5 6 
Format: ABS.S fd, fs MIPS | 
ABS.D fd, fs 
Purpose: To compute the absolute value of an FP value. 


Description: fd< absolute(fs) 


The absolute value of the value in FPR fs is placed in FPR fd. The operand and result are values 


in format fmt. 


This operation is arithmetic; a NaN operand signals invalid operation. 


Restrictions: 


The fields fs and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 


Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, fmt, AbsoluteValue(ValueFPR(fs, fmt))) 


Exceptions: 
Coprocessor Unusable 


Reserved Instruction 


Floating-Point 
Unimplemented Operation 
Invalid Operation 


249 


250 


Chapter 2 FPU Instruction Set 


ADD.fmt Floating-Point Add 
31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt ft fs fd ADD 
010001 000000 
6 5 5 5 5 6 
Format: ADD.S fd, fs, ft MIPS | 

ADD.D fd, fs, ft 
Purpose: To add FP values. 


Description: fd< fs+ft 
The value in FPR ft is added to the value in FPR fs. The result is calculated to infinite precision, 
rounded according to the current rounding mode in FCSR, and placed into FPR fd. The 
operands and result are values in format fmt. 


Restrictions: 
The fields fs, ft, and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operands must be values in format fmt; see 2.7 Valid Operands for FP Instructions. If 
they are not, the result is undefined and the value of the operand FPRs becomes undefined. 


Operation: 
StoreFPR (fd, fmt, ValueFPR(fs, fmt) + ValueFPR(ft, fmt)) 


Exceptions: 
Coprocessor Unusable 


Reserved Instruction 


Floating-Point 
Unimplemented Operation 
Invalid Operation 
Inexact 
Overflow 
Underflow 
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Branch on FP False BC1 F 


31 26 25 21 20 18 1716 15 0 
COP1 BC cc ind tf offset 
010001 01000 0/0 
6 5 3 #141 16 
Format: BC1F offset (cc = 0 implied) MIPS | 
BC1F cc, offset MIPS IV 
Purpose: To test an FP condition code and do a PC-relative conditional branch. 


Description: _ if (cc = 0) then branch 
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the FP condition code bit cc is false (0), branch to the effective target address after the 
instruction in the delay slot is executed 


An FP condition code is set by the FP compare instruction, C.cond.fmt. 


The MIPS I architecture defines a single floating-point condition code, implemented as the 
coprocessor | condition signal (Cp1Cond) and the C bit in the FP Control and Status register. 
MIPS I, II, and II architectures must have the cc field set to 0, which is implied by the first 
format in the Format section. 


The MIPS IV architecture adds seven more condition code bits to the original condition code 0. 
FP compare and conditional branch instructions specify the condition code bit to set or test. 
Both assembler formats are valid for MIPS IV. 


Restrictions: 
MIPS I, II, III: There must be at least one instruction between the compare instruction that 
sets a condition code and the branch instruction that tests it. Hardware does not detect a 
violation of this restriction. 


MIPS IV: None. 
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BC1 F Branch on FP False 


Operation: 


MIPS I, II, and III define a single condition code; MIPS IV adds 7 more condition codes.This 
operation specification is for the general “Branch On Condition” operation with the ¢f (true/ 
false) and nd (nullify delay slot) fields as variables. The individual instructions BC1F, BCIFL, 
BCIT, and BCITL have specific values for tf and nd. 


MIPS | 
1-1: condition — COC[1] = tf 
I: target_offsetc (offset) GP RLEN-(16+2) || offset || 0? 
1+1: if condition then 
PC < PC + target 
endif 


MIPS Il and MIPS III: 
1-1: condition — COC[1] = tf 
I: target_offsetc (offset) GPRLEN-(16+2) || offset || 07 
I+1: if condition then 
PC < PC + target 
else if nd then 
NullifyCurrentinstruction() 
endif 


MIPS IV: 
I: condition < FCC[cc] = tf 
target_offsetc (offset, ,)GPRLEN-(16+2) || offset || 02 
1+1: if condition then 
PC < PC + target 
else if nd then 
NullifyCurrentinstruction() 
endif 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 


Unimplemented Operation 


Programming Notes: 
With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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Branch on FP False Likely BC1 FL 


31 26 25 21 20 18 1716 15 0 
COP1 BC cc ind tf offset 
010001 01000 1|0 
6 5 3.4611 16 
Format: BC1FL offset (cc = 0 implied) MIPS Il 
BC1FL cc, offset MIPS IV 
Purpose: To test an FP condition code and do a PC-relative conditional branch; execute the 


delay slot only if the branch is taken. 


Description: _ if (cc = 0) then branch_likely 
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the FP condition code bit cc is false (0), branch to the effective target address after the 
instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay 
slot is not executed. 


An FP condition code is set by the FP compare instruction, C.cond.fmt. 


The MIPS I architecture defines a single floating-point condition code, implemented as the 
coprocessor | condition signal (Cp1Cond) and the C bit in the FP Control and Status register. 
MIPS I, II, and II architectures must have the cc field set to 0, which is implied by the first 
format in the Format section. 


The MIPS IV architecture adds seven more condition code bits to the original condition code 0. 
FP compare and conditional branch instructions specify the condition code bit to set or test. 
Both assembler formats are valid for MIPS IV. 


Restrictions: 
MIPS IT, III: There must be at least one instruction between the compare instruction that sets 
a condition code and the branch instruction that tests it. Hardware does not detect a violation 
of this restriction. 


MIPS IV: None. 
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BC1 FL Branch on FP False Likely 


Operation: 


MIPS I, and III define a single condition code; MIPS IV adds 7 more condition codes. This 
operation specification is for the general “Branch On Condition” operation with the ¢f (true/ 
false) and nd (nullify delay slot) fields as variables. The individual instructions BC1F, BCIFL, 
BCIT, and BCITL have specific values for tf and nd. 


MIPS Il and MIPS III: 
1-1: condition — COC[1] = tf 
I: target_offsetc (offset,s) 
1+1: if condition then 
PC < PC + target 
else if nd then 
NullifyCurrentinstruction() 
endif 


GPRLEN-(16+2) |) offset || 02 


MIPS IV: 
I: condition « FCC{[cc] = tf 
target_offsetc (offset, ,)GPRLEN-(16+2) || offset || 02 
I+1: if condition then 
PC < PC + target 
else if nd then 
NullifyCurrentinstruction() 
endif 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 


Unimplemented Operation 


Programming Notes: 
With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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Branch on FP True BC1 T 


31 26 25 21 20 181716 15 0 
COP1 BC cc ind tf offset 
010001 01000 0/1 
6 5 3 #141 16 
Format: BC1T offset (cc = 0 implied) MIPS | 
BC1IT cc, offset MIPS IV 
Purpose: To test an FP condition code and do a PC-relative conditional branch. 


Description: _ if (cc = 1) then branch 
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the FP condition code bit cc is true (1), branch to the effective target address after the 
instruction in the delay slot is executed 


An FP condition code is set by the FP compare instruction, C.cond.fmt. 


The MIPS I architecture defines a single floating-point condition code, implemented as the 
coprocessor | condition signal (Cp1Cond) and the C bit in the FP Control and Status register. 
MIPS I, II, and II architectures must have the cc field set to 0, which is implied by the first 
format in the Format section. 


The MIPS IV architecture adds seven more condition code bits to the original condition code 0. 
FP compare and conditional branch instructions specify the condition code bit to set or test. 
Both assembler formats are valid for MIPS IV. 


Restrictions: 
MIPS I, I, II: There must be at least one instruction between the compare instruction that 
sets a condition code and the branch instruction that tests it. Hardware does not detect a 
violation of this restriction. 


MIPS IV: None 


255 


256 


Chapter 2 FPU Instruction Set 


BC1 T Branch on FP True 


Operation: 


MIPS I, II, and III define a single condition code; MIPS IV adds 7 more condition codes.This 
operation specification is for the general “Branch On Condition” operation with the ¢ (true/ 
false) and nd (nullify delay slot) fields as variables. The individual instructions BC1F, BCIFL, 
BCIT, and BCITL have specific values for tf and nd. 


MIPS | 
1-1: condition — COC[1] = tf 
I: target < (offset,5)GPRLEN-(16+2) | offset || 07 
I+1: if condition then 
PC < PC + target 
endif 


MIPS Il and MIPS III: 
1-1: condition — COC[1] = tf 
I: target — (offset,;)OPRLEN-(16+2) || offset || 07 
I+1: if condition then 
PC < PC + target 
else if nd then 
NullifyCurrentinstruction() 
endif 


MIPS IV: 
I: condition — FCC[cc] = tf 
target — (offset,,)GPRLEN-(16+2) |) offset || 07 
1+1: if condition then 
PC < PC + target 
else if nd then 
NullifyCurrentinstruction() 
endif 


Exceptions: 
Coprocessor Unusable 


Reserved Instruction 


Floating-Point 
Unimplemented Operation 


Programming Notes: 
With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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Branch on FP True Likely BC1 TL 


31 26 25 21 20 18 1716 15 0 
COP1 BC cc ind tf offset 
010001 01000 1) 1 
6 5 3 #141 16 
Format: BCiTL offset (cc = 0 implied) MIPS II 
BC1TL cc, offset MIPS IV 
Purpose: To test an FP condition code and do a PC-relative conditional branch; execute the 


delay slot only if the branch is taken. 


Description: _ if (cc = 1) then branch_likely 
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the 
instruction following the branch (not the branch itself), in the branch delay slot, to form a 
PC-relative effective target address. 


If the FP condition code bit cc is true (1), branch to the effective target address after the 
instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay 
slot is not executed. 


An FP condition code is set by the FP compare instruction, C.cond.fmt. 


The MIPS I architecture defines a single floating-point condition code, implemented as the 
coprocessor | condition signal (Cp1Cond) and the C bit in the FP Control and Status register. 
MIPS I, II, and II architectures must have the cc field set to 0, which is implied by the first 
format in the Format section. 


The MIPS IV architecture adds seven more condition code bits to the original condition code 0. 
FP compare and conditional branch instructions specify the condition code bit to set or test. 
Both assembler formats are valid for MIPS IV. 


Restrictions: 
MIPS IT, III: There must be at least one instruction between the compare instruction that sets 
a condition code and the branch instruction that tests it. Hardware does not detect a violation 
of this restriction. 


MIPS IV: None. 
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BC1 TL Branch on FP True Likely 


Operation: 


MIPS I, and III define a single condition code; MIPS IV adds 7 more condition codes. This 
operation specification is for the general “Branch On Condition” operation with the ¢f (true/ 
false) and nd (nullify delay slot) fields as variables. The individual instructions BC1F, BCIFL, 
BCIT, and BCITL have specific values for tf and nd. 


MIPS Il and MIPS III: 
1-1: condition — COC[1] = tf 
I: target — (offset,.)GPRLEN- (1642) || offset || 07 
I+1: if condition then 
PC < PC + target 
else if nd then 
NullifyCurrentinstruction() 
endif 


MIPS IV: 
I: condition — FCC{[cc] = tf 
target — (offset,,)GPRLEN-(16+2) |) offset || 07 
I+1: if condition then 
PC «+ PC + target 
else if nd then 
NullifyCurrentinstruction() 
endif 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 


Unimplemented Operation 


Programming Notes: 
With the 18-bit signed instruction offset, the conditional branch range is + 128 KBytes. Use 
jump (J) or jump register (JR) instructions to branch to more distant addresses. 
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Floating-Point Compare C.cond.fmt 


31 26 25 21 20 16 15 1110 87 65 483 0 
COP1 fmt ft fs cc 0 | FC | cond 
010001 00; 11 
6 5 5 5 3 2 2 4 
Format: C.cond.S fs, ft (cc = 0 implied) MIPS | 
C.cond.D fs, ft (cc = 0 implied) 


Purpose: 


Description: 


C.cond.S cc, fs, ft MIPS IV 
C.cond.D cc, fs, ft 


To compare FP values and record the Boolean result in a condition code. 


cc < fs compare_cond ft 


The value in FPR fs is compared to the value in FPR ft; the values are in format fmt. The 
comparison is exact and neither overflows nor underflows. If the comparison specified by 
cond» 1s true for the operand values, then the result is true, otherwise it is false. If no exception 
is taken, the result is written into condition code cc; true is | and false is 0. 


If cond; is set and at least one of the values is a NaN, an Invalid Operation condition is raised; 
the result depends on the FP exception model currently active. 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written and an Invalid Operation 
exception is taken immediately. Otherwise, the Boolean result is written into condition 


code cc. 


There are four mutually exclusive ordering relations for comparing floating-point values; one 
relation is always true and the others are false. The familiar relations are greater than, less than, 
and equal. In addition, the IEEE floating-point standard defines the relation unordered which 
is true when at least one operand value is NaN; NaN compares unordered with everything, 
including itself. Comparisons ignore the sign of zero, so +0 equals -0. 


The comparison condition is a logical predicate, or equation, of the ordering relations such as 
“less than or equal’, “equal”, “not less than”, or “unordered or equal”. Compare distinguishes 
sixteen comparison predicates. The Boolean result of the instruction is obtained by substituting 
the Boolean value of each ordering relation for the two FP values into equation. If the equal 
relation is true, for example, then all four example predicates above would yield a true result. 
If the unordered relation is true then only the final predicate, “unordered or equal’ would yield 


a true result. 


Logical negation of a compare result allows eight distinct comparisons to test for sixteen 
predicates as shown in Table 2-20. Each mnemonic tests for both a predicate and its logical 
negation. For each mnemonic, compare tests the truth of the first predicate. When the first 
predicate is true, the result is true as shown in the “if predicate is true” column (note that the 
False predicate is never true and False/True do not follow the normal pattern). When the first 
predicate is true, the second predicate must be false, and vice versa. The truth of the second 
predicate is the logical negation of the instruction result. After a compare instruction, test for 
the truth of the first predicate with the Branch on FP True (BC1T) instruction and the truth of 
the second with Branch on FP False (BCIF). 
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C.cond.fmt 


Floating-Point Compare 


Table 2-20 FPU Comparisons Without Special Operand Exceptions 


Instr Comparison Predicate ena Instr 
cond logically ie es pis vt a rend Aeld 
Mne- pret excp 
monic >|< = pe ifQ |] 3 |2..0 
NaN 
F False [this predicate is always False, F| F F F e 6 
True (T) it never has a True result] TTTT 
UN Unordered F|/F F Tj T 
Ordered (OR) T|T T F] F ; 
EQ Equal F|F T F] T 5 
Not Equal (NEQ) TTF Ty F 
UEQ | Unordered or Equal F|/F T Tj T ; 
Ordered or Greater than or Less than (OGL) T| T F F F as 0 
OLT | Ordered or Less Than F|/T F F| T 
Unordered or Greater than or Equal (UGE) T|F T T| F : 
ULT | Unordered or Less Than F|/T FT| T 
Ordered or Greater than or Equal (OGE) T|F TF F 2 
OLE | Ordered or Less than or Equal F|T T F| T 6 
Unordered or Greater Than (UGT) T|F F T| F 
ULE | Unordered or Less than or Equal F/T T Tj T 7 
Ordered or Greater Than (OGT) T| F F F F 


key: “?” = unordered, “>” = greater than, “<“ = less than, 


669? 


is equal,“T” = True, “F”’ = False 


There is another set of eight compare operations, distinguished by a cond; value of 1, testing 
the same sixteen conditions. For these additional comparisons, if at least one of the operands is 
a NaN, including Quiet NaN, then an Invalid Operation condition is raised. If the Invalid 
Operation condition is enabled in the FCSR, then an Invalid Operation exception occurs. 
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Floating-Point Compare C.cond.fmt 


Table 2-21 FPU Comparisons With Special Operand Exceptions for QNaNs 


Instr Comparison Predicate ety Instr 
cond logically BERET eae CUE lke n e cone Hele 
Mne- Dee excp 
monic >|< = ee ifQ | 3 | 2.0 
NaN 
SF Signaling False [this predicate always False] F| F F F éj 
Signaling True (ST) TTTT 
NGL | Not Greater than or Less than or Equal F|/F F Tj T 
E Greater than or Less than or Equal (GLE) T|T T F F : 
SEQ | Signaling Equal F|/F T F| T 5 
Signaling Not Equal (SNE) T|T F Tj) F 
NGL | Not Greater than or Less than F|/F T Tj T 
Greater than or Less than (GL) T| T F F F See ie ; 
LT Less than F|/T F F| T 
Not Less Than (NLT) T/F T T| F ' 
NGE | Not Greater than or Equal F/T FT| T 5 
Greater than or Equal (GE) T|F TF F 
LE Less than or Equal F|T T F| T 6 
Not Less than or Equal (NLE) T|F F TT] F 
NGT | Not Greater than F/T T Tj T 
Greater than (GT) T| F F F F f 
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key: “?” = unordered, “>” = greater than, “<“ = less than, is equal,“T” = True, “F’ = False 


The instruction encoding is an extension made in the MIPS IV architecture. In previous 
architecture levels the cc field for this instruction must be 0. 


The MIPS I architecture defines a single floating-point condition code, implemented as the 
coprocessor | condition signal (Cp1Cond) and the C bit in the FP Control and Status register. 
MIPS I, II, and II architectures must have the cc field set to 0, which is implied by the first 
format in the Format section. 


The MIPS IV architecture adds seven more condition code bits to the original condition code 0. 
FP compare and conditional branch instructions specify the condition code bit to set or test. 
Both assembler formats are valid for MIPS IV. 
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C.cond.fmt Floating-Point Compare 


Restrictions: 


The fields fs and ft must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operands must be values in format fmt; see 2.7 Valid Operands for FP Instructions. If 
they are not, the result is undefined and the value of the operand FPRs becomes undefined. 


MIPS I, I, III: There must be at least one instruction between the compare instruction that 
sets a condition code and the branch instruction that tests it. Hardware does not detect a 
violation of this restriction. 


Operation: 


if NaN(Value FPR (fs, fmt)) or NaN(ValueFPR(ft, fmt)) then 
less < false 
equal < false 
unordered < true 
if t then 
SignalException(InvalidOperation) 
endif 
else 
less < ValueFPR(fs, fmt) < ValueFPR(ft, fmt) 
equal < ValueFPR(fs, fmt) = ValueFPR(ft, fmt) 
unordered < false 
endif 
condition <— (conds and less) or (cond; and equal) or (condg and unordered) 
FCC[cc] < condition 
if cc = 0 then 
COC[1] < condition 
endif 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Unimplemented Operation 
Invalid Operation 
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Floating-Point Compare C.cond.fmt 


Programming Notes: 

FP computational instructions, including compare, that receive an operand value of Signaling 
NaN, will raise the Invalid Operation condition. The comparisons that raise the Invalid 
Operation condition for Quiet NaNs in addition to SNaNs, permit a simpler programming model 
if NaNs are errors. Using these compares, programs do not need explicit code to check for 
QNaNs causing the unordered relation. Instead, they take an exception and allow the exception 
handling system to deal with the error when it occurs. For example, consider a comparison in 
which we want to know if two numbers are equal, but for which unordered would be an error. 


# comparisons using explicit tests for QNaN 
c.eq.d $f2,$f4 # check for equal 
nop 
belt L2 # it is equal 
c.un.d $f2,$f4 # it is not equal, but might be unordered 
bc1t | ERROR# unordered goes off to an error handler 
# not-equal-case code here 


# equal-case code here 
L2: 


# comparison using comparisons that signal QNaN 
c.seq.d $f2,$f4 # check for equal 
nop 
belt L2 # it is equal 
nop 
# it is not unordered here... 
# not-equal-case code here 


#equal-case code here 
L2: 
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CEl L. L.fmt Floating-Point Ceiling Convert to Long Fixed-Point 
31 26 25 21 20 16 15 11. 10 6 5 0 


COP1 fmt 0 fs fd CEIL.L 
010001 00000 001010 
6 5 5 5 5 6 


Format: CEIL.L.S fd, fs MIPS Ill 
CEIL.L.D fd, fs 


Purpose: To convert an FP value to 64-bit fixed-point, rounding up. 


Description: fd < convert_and_round(fs) 
The value in FPR fs in format fmt, is converted to a value in 64-bit long fixed-point format 
rounding toward +co (rounding mode 2). The result is placed in FPR fd. 


When the source value is Infinity, NaN, or rounds to an integer outside the range 391629821, 
the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The 
result depends on the FP exception model currently active. 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written to fd and an Invalid 
Operation exception is taken immediately. Otherwise, the default result, °F a as 
written to fd. 


Restrictions: 
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for long fixed-point; see 2.3 
Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Invalid Operation Unimplemented Operation 
Inexact Overflow 
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Floating-Point Ceiling Convert to Word Fixed-Point CEl L.W.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd CEIL.W 
010001 00000 001110 
6 5 5 5 5 6 
Format: CEIL.W.S fd, fs MIPS Il 
CEIL.W.D fd, fs 
Purpose: To convert an FP value to 32-bit fixed-point, rounding up. 


Description: fd < convert_and_round(fs) 
The value in FPR fs in format fmt, is converted to a value in 32-bit word fixed-point format 
rounding toward +o (rounding mode 2). The result is placed in FPR fd. 


When the source value is Infinity, NaN, or rounds to an integer outside the range peti he, 
the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The 
result depends on the FP exception model currently active. 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written to fd and an Invalid 
Operation exception is taken immediately. Otherwise, the default result, tt as 
written to fd. 


Restrictions: 
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed-point; see 
2.3 Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Invalid Operation 
Unimplemented Operation 
Inexact 
Overflow 
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CFC1 Move Control Word from Floating-Point 
31 26 25 21 20 16 15 11 10 0 
COP1 CF rt fs 0 
010001 00010 000 0000 0000 
6 5 5 5 11 
Format: CFC1 rt, fs MIPS | 
Purpose: To copy a word from an FPU control register to a GPR. 


Description: rt < FP_Control[fs] 
Copy the 32-bit word from FP (coprocessor 1) control register fs into GPR rt, sign-extending it 
if the GPR is 64 bits. 


Restrictions: 
There are only a couple control registers defined for the floating-point unit. The result is not 
defined if fs specifies a register that does not exist. 


For MIPS I, MIPS II, and MIPS III, the contents of GPR rt are undefined for the instruction 
immediately following CFC1. 


Operation: MIPS | - Ill 


I temp < FCRIifs] 
1+1: GPR[rt] < sign_extend(temp) 


Operation: MIPS IV 


temp << FCRifs] 
GPR[rt]~ sign_extend(temp) 


Exceptions: 
Coprocessor Unusable 
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Move Control Word to Floating-Point CTC1 
31 26 25 21 20 16 15 11 10 0 
COP1 CT rt fs 0 
010001 00110 0000000 0000 
6 5 5 5 11 
Format: CTC1 rt, fs MIPS | 
Purpose: To copy a word from a GPR to an FPU control register. 


Description: |FP_Control[fs] < rt 
Copy the low word from GPR rt into FP (coprocessor 1) control register fs. 


Writing to control register 31, the Floating-Point Control and Status Register or FCSR, causes 
the appropriate exception if any cause bit and its corresponding enable bit are both set. The 
register will be written before the exception occurs. 


Restrictions: 
There are only a couple control registers defined for the floating-point unit. The result is not 
defined if fs specifies a register that does not exist. 


For MIPS I, MIPS II, and MIPS III, the contents of floating-point control register fs are 
undefined for the instruction immediately following CTC1. 


Operation: MIPS | - Ill 


I temp < GPRIrt]31.9 
1+1: FCR[fs] < temp 
COC[1] — FCR[31]o3 


Operation: MIPS IV 


temp = GPRI[rt]31.6 
FCR[fs] < temp 
COC[1] + FCR[31]o3 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Unimplemented Operation 
Invalid Operation 
Division-by-zero 
Inexact 
Overflow 
Underflow 
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CVT.D.fmt Floating-Point Convert to Double Floating-Point 
31 26 25 21 20 16 15 11. 10 6 5 0 
COP1 fmt 0 fs fd CVT.D 

010001 00000 100001 
6 5 5 5 5 6 
Format: CVT.D.S fd, fs MIPS | 
CVT.D.W _ fd, fs 
CVT.D.L fd, fs MIPS Ill 
Purpose: To convert an FP or fixed-point value to double FP. 


Description: fd < convert_and_round(fs) 
The value in FPR fs in format fmt is converted to a value in double floating-point format rounded 
according to the current rounding mode in FCSR. The result is placed in FPR fd. 


If fmt is S or W, then the operation is always exact. 


Restrictions: 
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for double floating-point; see 
2.3 Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR (fd, D, ConvertFmt(ValueFPR(fs, fmt), fmt, D)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Invalid Operation 
Unimplemented Operation 
Inexact 
Overflow 
Underflow 
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Floating-Point Convert to Long Fixed-Point 


CVT.L.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd CVT.L 
010001 00000 100101 
6 5 5 5 5 6 
Format: CVT.L.S_ fd, fs MIPS III 
CVT.L.D fd, fs 
Purpose: To convert an FP value to a 64-bit fixed-point. 


Description: fd < convert_and_round(fs) 


Convert the value in format fmt in FPR fs to long fixed-point format, round according to the 


current rounding mode in FCSR, and place the result in FPR fd. 


When the source value is Infinity, NaN, or rounds to an integer outside the range 3169021, 
the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The 


result depends on the FP exception model currently active: 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written to fd and an Invalid 


Operation exception is taken immediately. Otherwise, the default result, 


written to fd. 


Restrictions: 


2%3_1, is 


The fields fs and fd must specify valid FPRs; fs for type fmt and fd for long fixed-point; see 
2.3 Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR (fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Invalid Operation 
Unimplemented Operation 
Inexact 
Overflow 
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CVT.S.fmt Floating-Point Convert to Single Floating-Point 
31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd CVT.S 
010001 00000 100000 
6 5 5 5 5 6 

Format: CVT.S.D fd, fs MIPS | 
CVT.S.W _ fd, fs 
CVT.S.L fd, fs MIPS Ill 
Purpose: To convert an FP or fixed-point value to single FP. 


Description: fd < convert_and_round(fs) 
The value in FPR fs in format fmt is converted to a value in single floating-point format rounded 
according to the current rounding mode in FCSR. The result is placed in FPR fd. 


Restrictions: 
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for single floating-point; see 
2.3 Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, S, ConvertFmt(ValueFPR(fs, fmt), fmt, S)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Invalid Operation 
Unimplemented Operation 
Inexact 
Overflow 
Underflow 
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Floating-Point Convert to Word Fixed-Point 


CVT.W.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd CVT.W 
010001 00000 100100 
6 5 5 5 5 6 
Format: CVT.W.S_ fd, fs MIPS | 
CVT.W.D_ fd, fs 
Purpose: To convert an FP value to 32-bit fixed-point. 


Description: fd < convert_and_round(fs) 


The value in FPR fs in format fmt is converted to a value in 32-bit word fixed-point format 
rounded according to the current rounding mode in FCSR. The result is placed in FPR fd. 


When the source value is Infinity, NaN, or rounds to an integer outside the range ap ieo eT, 
the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The 


result depends on the FP exception model currently active. 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written to fd and an Invalid 
Operation exception is taken immediately. Otherwise, the default result, tt as 


written to fd. 


Restrictions: 


The fields fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed-point; see 
2.3 Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Invalid Operation 
Unimplemented Operation 
Inexact 
Overflow 
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DIV.fmt Floating-Point Divide 
31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt ft fs fd DIV 
010001 000011 
6 5 5 5 5 6 
Format: DIV.S fd, fs, ft MIPS | 

DIV.D fd, fs, ft 
Purpose: To divide FP values. 


Description: fd< fs/ft 
The value in FPR fs is divided by the value in FPR ft. The result is calculated to infinite 
precision, rounded according to the current rounding mode in FCSR, and placed into FPR fd. 
The operands and result are values in format fmt. 


Restrictions: 
The fields fs, ft, and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operands must be values in format fmt; see 2.7 Valid Operands for FP Instructions. If 
they are not, the result is undefined and the value of the operand FPRs becomes undefined. 


Operation: 
StoreFPR (fd, fmt, ValueFPR(fs, fmt) / ValueFPR(ft, fmt)) 


Exceptions: 


Coprocessor Unusable 
Reserved Instruction 
Floating-Point 


Inexact Unimplemented Operation 
Division-by-zero Invalid Operation 
Overflow Underflow 
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Doubleword Move From Floating-Point DM FC1 
31 26 25 21 20 16 15 1110 0 
COP1 DMF rt fs 0 
010001 00001 000 0000 0000 
6 5 5 5 11 
Format: DMFC1 rt, fs MIPS Ill 
Purpose: To copy a doubleword from an FPR to a GPR. 


Description: rt<fs 
The doubleword contents of FPR fs are placed into GPR rt. 


If the coprocessor | general registers are 32-bits wide (a native 32-bit processor or 32-bit 
register emulation mode in a 64-bit processor), FPR fs is held in an even/odd register pair. The 
low word is taken from the even register fs and the high word is from fs+1. 


Restrictions: 
If fs does not specify an FPR that can contain a doubleword, the result is undefined; see 
2.3 Floating-Point Registers. 


For MIPS III, the contents of GPR rt are undefined for the instruction immediately following 
DMFC1. 


Operation: MIPS | - Ill 


I: if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
data + FGRifs] 
elseif fsg = 0 then /* valid specifier, 32-bit wide FGRs */ 
data — FGR[fs+1] || FGR[fs] 
else /* undefined for odd 32-bit FGRs */ 
UndefinedResult() 
endif 


1+1: GPR[rt] < data 
Operation: MIPS IV 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
data « FGRifs] 

elseif fs = 0 then /* valid specifier, 32-bit wide FGRs */ 
data — FGR[fs+1] || FGR[fs] 

else /* undefined for odd 32-bit FGRs */ 
UndefinedResult() 

endif 


GPRirt] «+ data 


Exceptions: 
Reserved Instruction 
Coprocessor Unusable 
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DMTC1 Doubleword Move To Floating-Point 
31 26 25 21 20 16 15 1110 0 
COP1 DMT rt fs 0 
010001 00101 000 0000 0000 
6 5 5 5 11 
Format: DMTC1 rt, fs MIPS Ill 
Purpose: To copy a doubleword from a GPR to an FPR. 


Description: fs <rt 
The doubleword contents of GPR rt are placed into FPR fs. 


If coprocessor | general registers are 32-bits wide (a native 32-bit processor or 32-bit register 
emulation mode in a 64-bit processor), FPR fs is held in an even/odd register pair. The low word 
is placed in the even register fs and the high word is placed in fs+1. 


Restrictions: 
If fs does not specify an FPR that can contain a doubleword, the result is undefined; see 
2.3 Floating-Point Registers. 


For MIPS III, the contents of FPR fs are undefined for the instruction immediately following 
DMTC1. 


Operation: MIPS | - Ill 
I: data — GPR[r] 


I+1: if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
FGR[fs] « data 
elseif fs9 = 0 then /* valid specifier, 32-bit wide FGRs */ 


FGR[fs+1] eS datags 30 
FGR[fs] eS dataz; 9 

else /* undefined result for odd 32-bit FGRs */ 
UndefinedResult() 

endif 


Operation: MIPS IV 
data + GPR[ri] 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
FGR[fs] <« data 
elseif fs9 = 0 then /* valid specifier, 32-bit wide FGRs */ 


FGR[fs+1] datag3. 30 
FGR[fs] data31.0 

else /* undefined result for odd 32-bit FGRs */ 
UndefinedResult() 

endif 


Exceptions: 
Reserved Instruction 
Coprocessor Unusable 
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Floating-Point Floor Convert to Long Fixed-Point FLOOR.L.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd FLOOR.L 
010001 00000 001011 
6 5 5 5 5 6 
Format: FLOOR.L.S fd, fs MIPS Ill 
FLOOR.L.D fd, fs 
Purpose: To convert an FP value to 64-bit fixed-point, rounding down. 


Description: fd < convert_and_round(fs) 
The value in FPR fs in format fmt, is converted to a value in 64-bit long fixed-point format 
rounding toward -co (rounding mode 3). The result is placed in FPR fd. 


When the source value is Infinity, NaN, or rounds to an integer outside the range 391629821, 
the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The 
result depends on the FP exception model currently active. 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written to fd and an Invalid 
Operation exception is taken immediately. Otherwise, the default result, °F a as 
written to fd. 


Restrictions: 
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for long fixed-point; see 2.3 
Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Invalid Operation Unimplemented Operation 
Inexact Overflow 
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FLOOR.W.fmt Floating-Point Floor Convert to Word Fixed-Point 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd FLOOR.W 
010001 00000 001111 
6 5 5 5 5 6 
Format: FLOOR.W.S fd, fs MIPS Il 
FLOOR.W.D fd, fs 
Purpose: To convert an FP value to 32-bit fixed-point, rounding down. 


Description: fd < convert_and_round(fs) 
The value in FPR fs in format fmt, is converted to a value in 32-bit word fixed-point format 
rounding toward —co (rounding mode 3). The result is placed in FPR fd. 


When the source value is Infinity, NaN, or rounds to an integer outside the range sptbiga 7, 
the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The 
result depends on the FP exception model currently active. 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written to fd and an Invalid 
Operation exception is taken immediately. Otherwise, the default result, tt as 
written to fd. 


Restrictions: 
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed-point; see 2.3 
Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Invalid Operation Unimplemented Operation 
Inexact Overflow 
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Load Doubleword to Floating-Point 


LDC1 


31 26 25 21 20 16 15 0 
LDC1 base ft offset 
110101 
6 5 5 16 
Format: LDC1 ft, offset(base) MIPS Ill 
Purpose: To load a doubleword from memory to an FPR. 


Description: ft < memory[base+offset] 


The contents of the 64-bit doubleword at the memory location specified by the aligned effective 
address are fetched and placed in FPR ft. The 16-bit signed offset is added to the contents of 


GPR base to form the effective address. 


If coprocessor | general registers are 32-bits wide (a native 32-bit processor or 32-bit register 
emulation mode in a 64-bit processor), FPR ft is held in an even/odd register pair. The low word 
is placed in the even register ft and the high word is placed in ft+1. 


Restrictions: 


If ft does not specify an FPR that can contain a doubleword, the result is undefined; see 


2.3 Floating-Point Registers. 


An Address Error exception occurs if EffectiveAddress) 9 # 0 (not doubleword-aligned). 


MIPS IV: The low-order 3 bits of the offset field must be zero. If they are not, the result of the 


instruction is undefined. 


Operation: 
vAddr < sign_extend(offset) + GPR[base] 


if vVAddro 9 # 03 then SignalException(AddressError) endif 
(pAddr, uncached) « AddressTranslation (vAddr, DATA, LOAD) 
data < LoadMemory(uncached, DOUBLEWORD, pAddr, vAddr, DATA) 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
FGR[ft] — data 
elseif ftg = 0 then /* valid specifier, 32-bit wide FGRs */ 


FGR[ft+1] — datag3 30 
FGR[ft] — data31.0 

else /* undefined result for odd 32-bit FGRs */ 
UndefinedResult() 

endif 


Exceptions: 


Coprocessor unusable 
Reserved Instruction 
TLB Refill, TLB Invalid 
Address Error 
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LDXC1 Load Doubleword Indexed to Floating-Point 
31 26 25 21 20 16 15 11 10 6 5 0 
COP1X base index 0 fd LDXC1 
010011 000001 
6 5 5 5 5 6 
Format: LDXC1_ fd, index(base) MIPS IV 

Purpose: To load a doubleword from memory to an FPR (GPR+GPR addressing). 


Description: fd < memory[base+index] 
The contents of the 64-bit doubleword at the memory location specified by the aligned effective 
address are fetched and placed in FPR fd. The contents of GPR index and GPR base are added 
to form the effective address. 


If coprocessor | general registers are 32-bits wide (a native 32-bit processor or 32-bit register 
emulation mode in a 64-bit processor), FPR fd is held in an even/odd register pair. The low 
word is placed in the even register fd and the high word is placed in fd+1. 


Restrictions: 
If fd does not specify an FPR that can contain a doubleword, the result is undefined; see 
2.3 Floating-Point Registers. 


The Region bits of the effective address must be supplied by the contents of base. If 
EffectiveAddress63 67 # baseg3 6, the result is undefined. 


An Address Error exception occurs if EffectiveAddress, 9 # 0 (not doubleword-aligned). 


MIPS IV: The low-order 3 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 


vAddr < GPR[base] + GPR[index] 

if vVAddro 9 # 03 then SignalException(AddressError) endif 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 

mem < LoadMemory(unchched, DOUBLEWORD, pAddr, vAddr, DATA) 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
FGR[fd] < data 
elseif fdg = 0 then /* valid specifier, 32-bit wide FGRs */ 


FGR[fd+1] — datag3._ 30 
FGR[fd] — dataz; 0 

else /* undefined result for odd 32-bit FGRs */ 
UndefinedResult() 

endif 


Exceptions: 


TLB Refill, TLB Invalid 
Address Error 

Reserved Instruction 
Coprocessor Unusable 
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Load Word to Floating-Point LWC1 
31 26 25 21 20 16 15 0 
LWC1 base ft offset 
110001 
6 5 5 16 
Format: LWC1 ft, offset(base) MIPS I 
Purpose: To load a word from memory to an FPR. 


Description: ft < memory[base+offset] 
The contents of the 32-bit word at the memory location specified by the aligned effective 
address are fetched and placed into the low word of coprocessor | general register ft. The 16-bit 
signed offset is added to the contents of GPR base to form the effective address. 


If coprocessor | general registers are 64-bits wide, bits 63..32 of register ft become undefined. 
See 2.3 Floating-Point Registers. 


Restrictions: 
An Address Error exception occurs if EffectiveAddress, 9 # 0 (not word-aligned). 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 32-bit Processors 


I: /* “mem’ is aligned 64-bits from memory. Pick out correct bytes. */ 
vAddr < sign_extend(offset) + GPR[base] 
if vAddr; 9 # 0° then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
mem < LoadMemory(uncached, WORD, pAddr, vAddr, DATA) 
1+1: FGR[ft] < mem 


Operation: 64-bit Processors 


/* “mem” is aligned 64-bits from memory. Pick out correct bytes. */ 
vAddr < sign_extend(offset) + GPR[base] 

if vAddr; 9 # 0? then SignalException(AddressError) endif 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
pAddr < pAddrpgize.1..3 || (PBAddro 9 xor (ReverseEndian || 0°) 
mem < LoadMemory(uncached, WORD, pAddr, vAddr, DATA) 
bytesel < vAddrs 9 xor (BigEndianCPU || 0) 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
FGR[ft] — undefined? || mems1,g-bytesel..8*bytesel 

else /* 32-bit wide FGRs */ 
FGR[ft] — MEM31 +8*bytesel..8*bytesel 

endif 
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LWC1 


Chapter 2. FPU Instruction Set 


Load Word to Floating-Point 


Exceptions: 
Coprocessor unusable 
Reserved Instruction 
TLB Refill, TLB Invalid 
Address Error 
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Load Word Indexed to Floating-Point LWXC1 
31 26 25 21 20 16 15 11 10 6 5 0 
COP1X base index 0 fd LWXC1 
010011 000000 

6 5 5 5 5 6 
Format: LWXC1_ fd, index(base) MIPS IV 
Purpose: To load a word from memory to an FPR (GPR+GPR addressing). 


Description: fd < memory[base+index] 
The contents of the 32-bit word at the memory location specified by the aligned effective 
address are fetched and placed into the low word of coprocessor 1 general register fd. The 
contents of GPR index and GPR base are added to form the effective address. 


If coprocessor | general registers are 64-bits wide, bits 63..32 of register fd become undefined. 
See 2.3 Floating-Point Registers. 


Restrictions: 
The Region bits of the effective address must be supplied by the contents of base. If 
EffectiveAddress63 67 # baseg3 6, the result is undefined. 


An Address Error exception occurs if EffectiveAddress, 9 # 0 (not word-aligned). 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 


vAddr < GPR[base] + GPR[index] 

if vAddr; o# 0° then SignalException(AddressError) endif 

(pAddr, uncached) < AddressTranslation (vAddr, DATA, LOAD) 
pAddr ~— pAddrpgiz_-1..3 || (PAddre 9 xor (ReverseEndian || 07)) 

/* “mem” is aligned 64-bits from memory. Pick out correct bytes. */ 
mem < LoadMemory(uncached, WORD, pAddr, vAddr, DATA) 
bytesel < vAddrs 9 xor (BigEndianCPU || 0) 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
FGR[fd] — undefined*? || mems1,s*bytesel..8*bytesel 

else /* 32-bit wide FGRs */ 
FGR[fd] <— MEM3148*bytesel..8*bytesel 

endif 

Exceptions: 
TLB Refill, TLB Invalid 
Address Error 


Reserved Instruction 
Coprocessor Unusable 
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MADD.fmt 


Floating-Point Multiply Add 


31 26 25 21 20 16 15 11 10 6 5 3 0 
COP1X fr ft fs fd MADD | fmt 
010011 100 
6 5 5 5 5 3 3 
Format: MADD.S fd, fr, fs, ft MIPS IV 


MADD.D _ fd, fr, fs, ft 
To perform a combined multiply-then-add of FP values. 


Purpose: 


Description: fd < (fs x ft) + fr 


The value in FPR fs is multiplied by the value in FPR ft to produce a product. The value in FPR 
jr is added to the product. The result sum is calculated to infinite precision, rounded according 
to the current rounding mode in FCSR, and placed into FPR fd. The operands and result are 


values in format fmt. 


The accuracy of the result depends which of two alternative arithmetic models is used by the 
implementation for the computation. The numeric models are explained in 2.6.2 Arithmetic 


Instructions. 


Restrictions: 


The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating- 
Point Registers. If they are not valid, the result is undefined. 


The operands must be values in format fmt; see 2.7 Valid Operands for FP Instructions. If 
they are not, the result is undefined and the value of the operand FPRs becomes undefined. 


Operation: 


vfr < ValueFPR(fr, fmt) 
vis < ValueFPR(fs, fmt) 
vft < ValueFPR(ft, fmt) 
StoreFPR(fd, fmt, vfr + vfs * vit) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 

Inexact 
Invalid Operation 
Underflow 
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Unimplemented Operation 
Overflow 
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Move Word From Floating-Point M FC1 
31 26 25 21 20 16 15 11 10 0 
COP1 MF rt fs 0 
010001 00000 000 0000 0000 
6 5 5 5 11 
Format: MFC1 rt, fs MIPS | 
Purpose: To copy a word from an FPU (CP1) general register to a GPR. 


Description: rt<fs 
The low word from FPR fs is placed into the low word of GPR rt. If GPR rt is 64 bits wide, 
then the value is sign extended. See 2.3 Floating-Point Registers. 


Restrictions: 
For MIPS I, MIPS II, and MIPS III the contents of GPR rt are undefined for the instruction 
immediately following MFC1. 


Operation: MIPS | - Ill 


I. word <FGR[fs]31 6 
1+1: GPR[rt] < sign_extend(word) 


Operation: MIPS IV 


word < FGR[fs]31.0 
GPRi[rt]+ sign_extend(word) 


Exceptions: 
Coprocessor Unusable 
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MOV.fmt Floating-Point Move 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd MOV 
010001 00000 000110 

6 5 5 5 5 6 
Format: MOV.S fd, fs MIPS | 
MOV.D_ fd, fs 
Purpose: To move an FP value between FPRs. 


Description: fd< fs 
The value in FPR fs is placed into FPR fd. The source and destination are values in format fmt. 


The move is non-arithmetic; it causes no IEEE 754 exceptions. 


Restrictions: 
The fields fs and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, fmt, ValueFPR(fs, fmt)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Unimplemented Operation 
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Move Conditional on FP False MOVF 
31 26 25 21 20 18 17 16 15 11 10 6 5 0 
SPECIAL 0 | tf 0 MOVCI 
rs cc rd 
000000 0/0 00000 | 000001 
6 5 3 1 1 5 5 6 

Format: MOVF _ rd, rs, cc MIPS IV 
Purpose: To test an FP condition code then conditionally move a GPR. 


Description: if (cc = 0) thenrd <-rs 


If the floating-point condition code specified by cc is zero, then the contents of GPR rs are 
placed into GPR rd. 


Restrictions: 
None 


Operation: 


active < FCC[cc] =tf 
if active then 

GPR[rd] < GPR[rs] 
endif 


Exceptions: 
Reserved Instruction 
Coprocessor Unusable 


285 


Chapter 2 FPU Instruction Set 


MOVF.fmt Floating-Point Move Conditional on FP False 
31 26 25 21 20 18 171615 11 10 6 5 0 
COP1 0 | tf MOVCF 
fmt cc fs fd 
010001 0/0 010001 
6 5 3 1 1 5 5 6 
Format: MOVF.S _ fd, fs, cc MIPS IV 

MOVF.D _ fd, fs, cc 
Purpose: To test an FP condition code then conditionally move an FP value. 


Description: _ if (cc = 0) then fd < fs 
If the floating-point condition code specified by cc is zero, then the value in FPR fs is placed 
into FPR fd. The source and destination are values in format fmt. 


If the condition code is not zero, then FPR fs is not copied and FPR fd contains its previous value 
in format fmt. If fd did not contain a value either in format fmt or previously unused data from 
a load or move-to operation that could be interpreted in format fmt, then the value of fd becomes 
undefined. 


The move is non-arithmetic; it causes no IEEE 754 exceptions. 


Restrictions: 
The fields fs and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 


if FCC[cc] = tf then 

StoreFPR(fd, fmt, ValueFPR(fs, fmt)) 
else 

StoreFPR(fd, fmt, ValueFPR(fd, fmt)) 
endif 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Unimplemented operation 
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Floating-Point Move Conditional on Not Zero MOVN.fmt 
31 26 25 21 20 16 15 1110 6 5 0 
CoP 1 fmt rt fs fd MOVN 
010001 010011 
6 5 5 5 5 6 
Format: MOVN.S fd, fs, rt MIPS IV 
MOVN.D __ fd, fs, rt 
Purpose: To test a GPR then conditionally move an FP value. 


Description: _ if (rt #0) then fd < fs 
If the value in GPR rt is not equal to zero then the value in FPR fs is placed in FPR fd. The 
source and destination are values in format fmt. 


If GPR rt contains zero, then FPR fs is not copied and FPR fd contains its previous value in 
format fmt. If fd did not contain a value either in format fmt or previously unused data from a 
load or move-to operation that could be interpreted in format fmt, then the value of fd becomes 
undefined. 


The move is non-arithmetic; it causes no IEEE 754 exceptions. 


Restrictions: 
The fields fs and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 


if GPR[rt] # 0 then 

StoreFPR(fd, fmt, ValueFPR(fs, fmt)) 
else 

StoreFPR(fd, fmt, ValueFPR(fd, fmt)) 
endif 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Unimplemented operation 
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MOVT Move Conditional on FP True 


31 26 25 21 20 18 17 1615 11 10 6 5 0 
SPECIAL 0 | tf 0 MOVCI 
rs cc rd 
000000 0} 1 00000 | 0900001 
6 5 3 1 1 5 5 6 
Format: MOVT _ rd, rs, cc MIPS IV 
Purpose: To test an FP condition code then conditionally move a GPR. 


Description: if (cc=1) thenrd<-rs 


If the floating-point condition code specified by cc is one then the contents of GPR rs are placed 
into GPR rd. 


Restrictions: 
None 


Operation: 
if FCC[cc] = tf then 
GPR[rd] < GPR{rs] 
endif 


Exceptions: 
Reserved Instruction 
Coprocessor Unusable 
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Floating-Point Move Conditional on FP True MOVT.fmt 
31 26 25 21 20 18 171615 11 10 6 5 0 
COP1 0) tf MOVCF 
fmt cc fs fd 
010001 0! 4 010001 

6 5 3 11 5 5 6 
Format: MOVT.S fd, fs, cc MIPS IV 
MOVT.D _ fd, fs, cc 
Purpose: To test an FP condition code then conditionally move an FP value. 


Description: _ if (cc = 1) then fd < fs 
If the floating-point condition code specified by cc is one then the value in FPR fs is placed into 
FPR fd. The source and destination are values in format fmt. 


If the condition code is not one, then FPR fs is not copied and FPR fd contains its previous value 
in format fmt. If fd did not contain a value either in format fmt or previously unused data from 
a load or move-to operation that could be interpreted in format fmt, then the value of fd becomes 
undefined. 


The move is non-arithmetic; it causes no IEEE 754 exceptions. 


Restrictions: 
The fields fs and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 


if FCC[cc] = tf then 

StoreFPR(fd, fmt, ValueFPR(fs, fmt)) 
else 

StoreFPR(fd, fmt, ValueFPR(fd, fmt)) 
endif 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Unimplemented operation 
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MOVZ.fmt Floating-Point Move Conditional on Zero 
31 26 25 21 20 16 15 11 10 6 5 0 
COP 1 fmt rt fs fd MOVZ 
010001 010010 
6 5 5 5 5 6 
Format: MOVZ.S fd, fs, rt MIPS IV 
MOVZ.D _ fd, fs, rt 
Purpose: To test a GPR then conditionally move an FP value. 


Description: _ if (rt = 0) then fd < fs 
If the value in GPR rt is equal to zero then the value in FPR fs is placed in FPR fd. The source 
and destination are values in format fmt. 


If GPR rt is not zero, then FPR fs is not copied and FPR fd contains its previous value in format 
jmt. If fd did not contain a value either in format fmt or previously unused data from a load or 
move-to operation that could be interpreted in format fmt, then the value of fd becomes 
undefined. 


The move is non-arithmetic; it causes no IEEE 754 exceptions. 


Restrictions: 
The fields fs and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 


if GPR[rt] = 0 then 

StoreFPR(fd, fmt, ValueFPR(fs, fmt)) 
else 

StoreFPR(fd, fmt, ValueFPR(fd, fmt)) 
endif 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Unimplemented operation 
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Floating-Point Multiply Subtract MSUB.fmt 


31 26 25 21 20 16 15 11 10 6 5 32 0 
COP1X fr ft fs fd MSUB | fmt 
010011 101 
6 5 5 5 5 3 3 
Format: MSUB.S fd, fr, fs, ft MIPS IV 
MSUB.D fad, fr, fs, ft 
Purpose: To perform a combined multiply-then-subtract of FP values. 


Description: fd < (fs x ft) - fr 
The value in FPR fs is multiplied by the value in FPR ft to produce an intermediate product. The 
value in FPR fr is subtracted from the product. The subtraction result is calculated to infinite 
precision, rounded according to the current rounding mode in FCSR, and placed into FPR fd. 
The operands and result are values in format fmt. 


The accuracy of the result depends which of two alternative arithmetic models is used by the 
implementation for the computation. The numeric models are explained in 2.6.2 Arithmetic 
Instructions. 


Restrictions: 
The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating- 
Point Registers. If they are not valid, the result is undefined. 


The operands must be values in format fmt; see 2.7 Valid Operands for FP Instructions. If 
they are not, the result is undefined and the value of the operand FPRs becomes undefined. 


Operation: 


vir < ValueFPR(fr, fmt) 
vis < ValueFPR(fs, fmt) 
vit < ValueFPR(ft, fmt) 
StoreFPR(fd, fmt, (vfs * vft) - vfr) 


Exceptions: 
Reserved Instruction 
Coprocessor Unusable 
Floating-Point 


Inexact Unimplemented Operation 
Invalid Operation Overflow 
Underflow 
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MTC1 Move Word to Floating-Point 
31 26 25 21 20 16 15 11 10 0 
COP1 MT rt fs 0 
010001 00100 000 0000 0000 
6 5 5 5 11 
Format: MTC1 rt, fs MIPS | 
Purpose: To copy a word from a GPR to an FPU (CP1) general register. 


Description: fs <rt 
The low word in GPR rt is placed into the low word of floating-point (coprocessor 1) general 
register fs. If coprocessor | general registers are 64-bits wide, bits 63..32 of register fs become 
undefined. See 2.3 Floating-Point Registers. 


Restrictions: 
For MIPS I, MIPS II, and MIPS III the value of FPR fs is undefined for the instruction 
immediately following MTC1. 


Operation: MIPS | - Ill 
I datac GPR[rt]31.0 


1+1: if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
FGR[fs] <— undefined® || data 
else /* 32-bit wide FGRs */ 
FGR[fs] < data 
endif 


Operation: MIPS IV 
data <— GPR[rt]3, 0 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
FGR[fs] <— undefined® || data 
else /* 32-bit wide FGRs */ 
FGR[fs] «+ data 
endif 
Exceptions: 


Coprocessor Unusable 
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Floating-Point Multiply 


MUL.fmt 


31 26 25 21 20 16 15 11.10 6 5 0 
COP1 fmt ft fs fd MUL 
010001 000010 

6 5 5 5 5 6 
Format: MUL.S fd, fs, ft MIPS | 
MUL.D fd, fs, ft 
Purpose: To multiply FP values. 


Description: fd< fs x ft 


The value in FPR fs is multiplied by the value in FPR ft. The result is calculated to infinite 
precision, rounded according to the current rounding mode in FCSR, and placed into FPR fd. 


The operands and result are values in format fmt. 


Restrictions: 


The fields fs, ft, and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 


Registers. If they are not valid, the result is undefined. 


The operands must be values in format fmt; see 2.7 Valid Operands for FP Instructions. If 
they are not, the result is undefined and the value of the operand FPRs becomes undefined. 


Operation: 
StoreFPR (fd, fmt, ValueFPR(fs, fmt) * ValueFPR(ft, fmt)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 

Floating-Point 


Inexact Unimplemented Operation 
Invalid Operation Overflow 
Underflow 
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N EG.fmt Floating-Point Negate 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd NEG 
010001 00000 000111 

6 5 5 5 5 6 
Format: NEG.S fd, fs MIPS | 
NEG.D fd, fs 
Purpose: To negate an FP value. 


Description: fd< - (fs) 
The value in FPR fs is negated and placed into FPR fd. The value is negated by changing the 
sign bit value. The operand and result are values in format fmt. 


This operation is arithmetic; a NaN operand signals invalid operation. 


Restrictions: 
The fields fs and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, fmt, Negate(ValueFPR(fs, fmt))) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Unimplemented Operation 
Invalid Operation 
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Floating-Point Negative Multiply Add NMADD.fmt 


31 26 25 21 20 16 15 11 10 6 5 32 0 
COP1X fr ft fs fd NMADD|_ fmt 
010011 ee 
6 5 5 5 5 3 3 
Format: NMADD.S. fd, fr, fs, ft MIPS IV 
NMADD.D _ fd, fr, fs, ft 
Purpose: To negate a combined multiply-then-add of FP values. 


Description: fd< - ((fs x ft) + fr) 
The value in FPR fs is multiplied by the value in FPR ft to produce an intermediate product. The 
value in FPR fr is added to the product. The result sum is calculated to infinite precision, 
rounded according to the current rounding mode in FCSR, negated by changing the sign bit, and 
placed into FPR fd. The operands and result are values in format fmt. 


The accuracy of the result depends which of two alternative arithmetic models is used by the 
implementation for the computation. The numeric models are explained in 2.6.2 Arithmetic 
Instructions. 


Restrictions: 
The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating- 
Point Registers. If they are not valid, the result is undefined. 


The operands must be values in format fmt; see 2.7 Valid Operands for FP Instructions. If 
they are not, the result is undefined and the value of the operand FPRs becomes undefined. 


Operation: 


vir < ValueFPR(fr, fmt) 
vis < ValueFPR(fs, fmt) 
vit < ValueFPR(ft, fmt) 
StoreFPR(fd, fmt, -(vfr + vfs * vft)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 


Inexact Unimplemented Operation 
Invalid Operation Overflow 
Underflow 


295 


Chapter 2 FPU Instruction Set 


NMSU B.fmt Floating-Point Negative Multiply Subtract 
31 26 25 21 20 16 15 11 10 6 5 32 0 


COP1X fr ft fs fd NMSUB| fmt 
010011 111 
6 5 5 5 5 3 3 


Format: NMSUB.S _ fd, fr, fs, ft MIPS IV 
NMSUB.D_ fd, fr, fs, ft 


Purpose: To negate a combined multiply-then-subtract of FP values. 


Description: fd< - ((fs x ft) - fr) 
The value in FPR fs is multiplied by the value in FPR ft to produce an intermediate product. The 
value in FPR /r is subtracted from the product. The result is calculated to infinite precision, 
rounded according to the current rounding mode in FCSR, negated by changing the sign bit, and 
placed into FPR fd. The operands and result are values in format fmt. 


The accuracy of the result depends which of two alternative arithmetic models is used by the 
implementation for the computation. The numeric models are explained in 2.6.2 Arithmetic 
Instructions. 


Restrictions: 
The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating- 
Point Registers. If they are not valid, the result is undefined. 


The operands must be values in format fmt; see 2.7 Valid Operands for FP Instructions. If 
they are not, the result is undefined and the value of the operand FPRs becomes undefined. 


Operation: 


vir < ValueFPR(fr, fmt) 
vis < ValueFPR(fs, fmt) 
vit < ValueFPR(ft, fmt) 
StoreFPR(fd, fmt, -((vfs * vft) - vfr)) 


Exceptions: 
Reserved Instruction 
Coprocessor Unusable 
Floating-Point 
Inexact Unimplemented Operation 
Invalid Operation Overflow 
Underflow 
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Prefetch Indexed (R10000 only) PREFX 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1X base index hint , PREFX 
010011 00000 001111 
6 5 5 5 5 6 
Format: PREFX hint, index(base) MIPS IV 
Purpose: To prefetch locations from memory (GPR+GPR addressing). 


Description: prefetch_memory[base+index] 
PREFX adds the contents of GPR index to the contents of GPR base to form an effective byte 
address. It advises that data at the effective address may be used in the near future. The hint 
field supplies information about the way that the data is expected to be used. 


PREFX is an advisory instruction. It may change the performance of the program. For all hint 
values, it neither changes architecturally-visible state nor alters the meaning of the program. An 
implementation may do nothing when executing a PREFX instruction. 


If MIPS IV instructions are supported and enabled and Coprocessor | is enabled (allowing 
access to CP1X), PREFX does not cause addressing-related exceptions. If it raises an exception 
condition, the exception condition is ignored. If an addressing-related exception condition is 
raised and ignored, no data will be prefetched. Even if no data is prefetched in such a case, some 
action that is not architecturally-visible, such as writeback of a dirty cache line, might take 
place. 


PREFX will never generate a memory operation for a location with an uncached memory access 
type (see 1.6 Memory Access Types). 


If PREFX results in a memory operation, the memory access type used for the operation is 
determined by the memory access type of the effective address, just as it would be if the memory 
operation had been caused by a load or store to the effective address. 


PREFX enables the processor to take some action, typically prefetching the data into cache, to 
improve program performance. The action taken for a specific PREFX instruction is both 
system and context dependent. Any action, including doing nothing, is permitted that does not 
change architecturally-visible state or alter the meaning of a program. It is expected that 
implementations will either do nothing or take an action that will increase the performance of 
the program. 


For a cached location, the expected, and useful, action is for the processor to prefetch a block 
of data that includes the effective address. The size of the block, and the level of the memory 
hierarchy it is fetched into are implementation specific. 
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PREFX (R10000 only) Prefetch Indexed 


The hint field supplies information about the way the data is expected to be used. No hint value 
causes an action that modifies architecturally-visible state. A processor may use a hint value to 
improve the effectiveness of the prefetch action. The defined hint values and the recommended 
prefetch action are shown in the table below. The hint table may be extended in future 
implementations. 


Table 2-22 Values of Hint Field for Prefetch Instruction 


Value Name Data use and desired prefetch action 


0 load Data is expected to be loaded (not modified). 
Fetch data as if for a load. 


1 store Data is expected to be stored or modified. 
Fetch data as if for a store. 


2-3 Not yet defined. 


4 load_streamed Data is expected to be loaded (not modified) but not reused 
extensively; it will “stream” through cache. 

Fetch data as if for a load and place it in the cache so that it 
will not displace data prefetched as “retained”. 


5 store_streamed Data is expected to be stored or modified but not reused 
extensively; it will “stream” through cache. 

Fetch data as if for a store and place it in the cache so that it 
will not displace data prefetched as “retained”. 


6 load_retained Data is expected to be loaded (not modified) and reused 
extensively; it should be “retained” in the cache. 

Fetch data as if for a load and place it in the cache so that it 
will not be displaced by data prefetched as “streamed”. 


7 store_retained Data is expected to be stored or modified and reused exten- 
sively; it should be “retained” in the cache. 

Fetch data as if for a store and place it in the cache so that 
will not be displaced by data prefetched as “streamed”. 


8-31 Not yet defined. 


Restrictions: 
The Region bits of the effective address must be supplied by the contents of base. If 
EffectiveAddress¢3 62 # base¢3, 62, the result of the instruction is undefined. 


Operation: 


vAddr < GPR[base] + GPR[index] 
(pAddr, uncached) < AddressTranslation(vAddr, DATA, LOAD) 
Prefetch(uncached, pAddr, vAddr, DATA, hint) 


Exceptions: 
Reserved Instruction 
Coprocessor Unusable 
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(R10000 only) PREFX 


Prefetch Indexed 


Programming Notes: 
Prefetch can not prefetch data from a mapped location unless the translation for that location is 
present in the TLB. Locations in memory pages that have not been accessed recently may not 
have translations in the TLB, so prefetch may not be effective for such locations. 


Prefetch does not cause addressing exceptions. It will not cause an exception to prefetch using 
an address pointer value before the validity of a pointer is determined. 


Implementation Notes: 
It is recommended that a reserved hint field value either cause a default prefetch action that is 
expected to be useful for most cases of data use, such as the “load” hint, or cause the instruction 


to be treated as a NOP. 
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RECIl P.fmt Reciprocal Approximation 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd RECIP 
010001 00000 010101 
6 5 5 5 5 6 
Format: RECIP.S fd, fs MIPS IV 
RECIP.D fd, fs 
Purpose: To approximate the reciprocal of an FP value (quickly). 


Description: fd<1.0/fs 
The reciprocal of the value in FPR fs is approximated and placed into FPR fd. 
The operand and result are values in format fmt. 


The numeric accuracy of this operation is implementation dependent; it does not meet the 
accuracy specified by the IEEE 754 Floating-Point standard. The computed result differs from 
the both the exact result and the IEEE-mandated representation of the exact result by no more 
than one unit in the least-significant place (ulp). 


It is implementation dependent whether the result is affected by the current rounding mode in 
FCSR. 


Restrictions: 
The fields fs and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 
Operation: 
StoreFPR(fd, fmt, 1.0 / valueFPR(fs, fmt)) 
Exceptions: 
Coprocessor Unusable 


Reserved Instruction 
Floating-Point 


Inexact Unimplemented Operation 
Division-by-zero Invalid Operation 
Overflow Underflow 


Chapter 2 FPU Instruction Set 


Floating-Point Round to Long Fixed-Point ROUND. L.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd ROUND.L 
010001 00000 001000 
6 5 5 5 5 6 
Format: ROUND.L.S fd, fs MIPS III 
ROUND.L.D fd, fs 
Purpose: To convert an FP value to 64-bit fixed-point, rounding to nearest. 


Description: fd < convert_and_round(fs) 
The value in FPR fs in format fmt, is converted to a value in 64-bit long fixed-point format 
rounding to nearest/even (rounding mode 0). The result is placed in FPR fd. 


When the source value is Infinity, NaN, or rounds to an integer outside the range Serna 
the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The 
result depends on the FP exception model currently active. 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written to fd and an Invalid 
Operation exception is taken immediately. Otherwise, the default result, °F a as 
written to fd. 


Restrictions: 
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for long fixed-point; see 2.3 
Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Inexact Unimplemented Operation 
Overflow Invalid Operation 
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ROUND.W.fmt Floating-Point Round to Word Fixed-Point 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd ROUND.W 
010001 00000 001100 
6 5 5 5 5 6 
Format: ROUND.W.S fd, fs MIPS Il 
ROUND.W.D _ fd, fs 
Purpose: To convert an FP value to 32-bit fixed-point, rounding to nearest. 


Description: fd < convert_and_round(fs) 
The value in FPR fs in format fmt, is converted to a value in 32-bit word fixed-point format 
rounding to nearest/even (rounding mode 0). The result is placed in FPR fd. 


When the source value is Infinity, NaN, or rounds to an integer outside the range ape ig a hy, 
the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The 
result depends on the FP exception model currently active. 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written to fd and an Invalid 
Operation exception is taken immediately. Otherwise, the default result, tt as 
written to fd. 


Restrictions: 
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed-point; see 
2.3 Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Inexact Unimplemented Operation 
Invalid Operation Overflow 
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Reciprocal Square Root Approximation RSQRT.fmt 
31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd RSQRT 
010001 00000 010110 
6 5 5 5 5 6 
Format: RSQRT.S fd, fs MIPS IV 

RSQRT.D fd, fs 
Purpose: To approximate the reciprocal of the square root of an FP value (quickly). 


Description: fd< 1.0/sqrt(fs) 
The reciprocal of the positive square root of the value in FPR fs is approximated and placed 
into FPR fd. The operand and result are values in format fmt. 


The numeric accuracy of this operation is implementation dependent; it does not meet the 
accuracy specified by the IEEE 754 Floating-Point standard. The computed result differs 
from the both the exact result and the IEEE-mandated representation of the exact result by 
no more than two units in the least-significant place (ulp). 


It is implementation dependent whether the result is affected by the current rounding 
mode in FCSR. 
Restrictions: 


The fields fs and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If 
it is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, fmt, 1.0 / SquareRoot(valueFPR(fs, fmt))) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 


Inexact Unimplemented Operation 
Division-by-zero Invalid Operation 
Overflow Underflow 
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SDC1 Store Doubleword from Floating-Point 
31 26 25 21 20 16 15 0 
SDC1 base ft offset 
111101 
6 5 5 16 
Format: SDC1_ ft, offset(base) MIPS Ill 
Purpose: To store a doubleword from an FPR to memory. 


Description: | memory[base-+offset] < ft 
The 64-bit doubleword in FPR ff is stored in memory at the location specified by the aligned 
effective address. The 16-bit signed offset is added to the contents of GPR base to form the 
effective address. 


If coprocessor | general registers are 32-bits wide (a native 32-bit processor or 32-bit register 
emulation mode in a 64-bit processor), FPR ft is held in an even/odd register pair. The low word 
is taken from the even register ft and the high word is from ft+1. 


Restrictions: 
If ft does not specify an FPR that can contain a doubleword, the result is undefined; see 
2.3 Floating-Point Registers. 


An Address Error exception occurs if EffectiveAddress, 9 # 0 (not doubleword-aligned). 


MIPS IV: The low-order 3 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 


vAddr < sign_extend(offset) + GPR[base] 
if vVAddro 9 # 03 then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation(vAddr, DATA, STORE) 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
data + FGRift] 

elseif ftg = 0 then /* valid specifier, 32-bit wide FGRs */ 
data — FGR[ft+1] || FGR[ft] 

else /* undefined for odd 32-bit FGRs */ 
UndefinedResult() 

endif 


StoreMemory(uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 


Exceptions: 
Coprocessor unusable 
Reserved Instruction 
TLB Refill, TLB Invalid 
TLB Modified 
Address Error 
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Store Doubleword Indexed from Floating-Point SDXC1 
31 26 25 21 20 16 15 11 10 6 5 0 
COP1X base index fs 0 SDXC1 
010011 001001 

5 5 5 5 6 
Format: SDXC1_ fs, index(base) MIPS IV 
Purpose: To store a doubleword from an FPR to memory (GPR+GPR addressing). 


Description: | memory[base+index] < fs 
The 64-bit doubleword in FPR fs is stored in memory at the location specified by the aligned 
effective address. The contents of GPR index and GPR base are added to form the effective 
address. 


If coprocessor | general registers are 32-bits wide (a native 32-bit processor or 32-bit register 
emulation mode in a 64-bit processor), FPR fs is held in an even/odd register pair. The low word 
is taken from the even register fs and the high word is from fs+1. 


Restrictions: 
If fs does not specify an FPR that can contain a doubleword, the result is undefined; see 
2.3 Floating-Point Registers. 


The Region bits of the effective address must be supplied by the contents of base. If 
EffectiveAddress63 67 # baseg3 6, the result is undefined. 


An Address Error exception occurs if EffectiveAddress, 9 # 0 (not doubleword-aligned). 


MIPS IV: The low-order 3 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 


vAddr < GPR[base] + GPR[index] 
if vVAddro 9 # 03 then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation(vAddr, DATA, STORE) 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
data + FGRi[fs] 

elseif fsg = 0 then /* valid specifier, 32-bit wide FGRs */ 
data — FGR[fs+1] || FGR[fs] 

else /* undefined for odd 32-bit FGRs */ 
UndefinedResult() 

endif 


StoreMemory(uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 


Exceptions: 
TLB Refill, TLB Invalid 
TLB Modified 
Address Error 
Reserved Instruction 
Coprocessor Unusable 
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SQ RT.fmt Floating-Point Square Root 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd SQRT 
010001 00000 000100 
6 5 5 5 5 6 
Format: SQRT.S fd, fs MIPS Il 
SQRT.D_ fd, fs 
Purpose: To compute the square root of an FP value. 


Description: fd < SQRT(fs) 
The square root of the value in FPR fs is calculated to infinite precision, rounded according to 
the current rounding mode in FCSR, and placed into FPR fd. The operand and result are values 
in format fmt. 


If the value in FPR fs corresponds to —0, the result will be -0. 


Restrictions: 
If the value in FPR fs is less than 0, an Invalid Operation condition is raised. 


The fields fs and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 
Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, fmt, SquareRoot(ValueFPR(fs, fmt))) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Unimplemented Operation 
Invalid Operation 
Inexact 


Chapter 2 FPU Instruction Set 


Floating-Point Subtract 


SUB.fmt 


31 26 25 21 20 16 15 11. 10 6 5 0 
COP1 fmt ft fs fd SUB 
010001 000001 
6 5 5 5 5 6 
Format: SUB.S fd, fs, ft MIPS | 
SUB.D fd, fs, ft 
Purpose: To subtract FP values. 


Description: fd < fs - ft 


The value in FPR ft is subtracted from the value in FPR fs. The result is calculated to infinite 
precision, rounded according to the current rounding mode in FCSR, and placed into FPR fd. 


The operands and result are values in format fmt. 


Restrictions: 


The fields fs, ft, and fd must specify FPRs valid for operands of type fmt; see 2.3 Floating-Point 


Registers. If they are not valid, the result is undefined. 


The operands must be values in format fmt; see 2.7 Valid Operands for FP Instructions. If 
they are not, the result is undefined and the value of the operand FPRs becomes undefined. 


Operation: 
StoreFPR (fd, fmt, ValueFPR(fs, fmt) —- ValueFPR(ft, fmt)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 


Inexact Unimplemented Operation 
Invalid Operation Overflow 
Underflow 
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SWC1 Store Word from Floating-Point 
31 26 25 21 20 16 15 0 
SWC1 base ft offset 
111001 
6 5 5 16 
Format: SWC1_ ft, offset(base) MIPS I 
Purpose: To store a word from an FPR to memory. 


Description: _memory[base-+offset] < ft 
The low 32-bit word from FPR ff is stored in memory at the location specified by the aligned 
effective address. The 16-bit signed offset is added to the contents of GPR base to form the 
effective address. 


Restrictions: 
An Address Error exception occurs if EffectiveAddress, 9 # 0 (not word-aligned). 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 32-bit Processors 


vAddr < sign_extend(offset) + GPR[base] 

if vAddr, 9 # 0° then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
data + FGRIft] 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 


Operation: 64-bit Processors 


vAddr < sign_extend(offset) + GPR[base] 

if vAddr,; 5 # 0? then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation (vAddr, DATA, STORE) 
pAddr <— pAddrpgjze-1.3 | | (PAddre 9 xor (ReverseEndian || 07)) 
bytesel — vAddro 9 xor (BigEndianCPU || 0°) 

/* the bytes of the word are moved into the correct byte lanes */ 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
data — 092 8’bytesel || FGRIft]s4 9 || OF >Y'es/* top or bottom wd of 64-bit data */ 
else /* 32-bit wide FGRs */ 


data — 022 8’bytesel |) EGR rift] || 08 Pytesel/* top or bottom wd of 64-bit data */ 
endif 
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 


Exceptions: 
Coprocessor unusable 
Reserved Instruction 
TLB Refill, TLB Invalid 
TLB Modified 
Address Error 
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Store Word Indexed from Floating-Point SWXC1 
31 26 25 21 20 16 15 11 10 6 5 0 
COP1X base index fs 0 SWXC1 
010011 001000 

5 5 5 5 
Format: SWXC1_ fs, index(base) MIPS IV 
Purpose: To store a word from an FPR to memory (GPR+GPR addressing). 


Description: _memory[base+index] < fs 
The low 32-bit word from FPR fs is stored in memory at the location specified by the aligned 
effective address. The contents of GPR index and GPR base are added to form the effective 
address. 


Restrictions: 
The Region bits of the effective address must be supplied by the contents of base. If 
EffectiveAddress63 62 # baseg3 6, the result is undefined. 


An Address Error exception occurs if EffectiveAddress, 9 # 0 (not word-aligned). 


MIPS IV: The low-order 2 bits of the offset field must be zero. If they are not, the result of the 
instruction is undefined. 


Operation: 


vAddr < GPR[base] + GPR[index] 

if vAddr, 9 # 0? then SignalException(AddressError) endif 
(pAddr, uncached) < AddressTranslation(vAddr, DATA, STORE) 
pAddr — pAddrpgijze-1..3 || (pAddrs. 9 xor (ReverseEndian || 0°)) 
bytesel — vAddry 9 xor (BigEndianCPU || 07) 

/* the bytes of the word are moved into the correct byte lanes */ 


if SizeFGR() = 64 then /* 64-bit wide FGRs */ 
data — 092 8"bytesel |) EGR[fs]54_ 9 || 08 >¥2S¢!* top or bottom wd of 64-bit data */ 
else /* 32-bit wide FGRs */ 


data — 032 8'bytesel || EGRIfs] || 08 PYtese/* top or bottom wd of 64-bit data */ 
endif 
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 


Exceptions: 
TLB Refill, TLB Invalid 
TLB Modified 
Address Error 
Reserved Instruction 
Coprocessor Unusable 
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TRU NC. L.fmt Floating-Point Truncate to Long Fixed-Point 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd TRUNC.L 
010001 00000 001001 
6 5 5 5 5 6 
Format: TRUNC.L.S fd, fs MIPS III 
TRUNC.L.D fd, fs 
Purpose: To convert an FP value to 64-bit fixed-point, rounding toward zero. 


Description: fd < convert_and_round(fs) 
The value in FPR fs in format fmt, is converted to a value in 64-bit long fixed-point format 
rounding toward zero (rounding mode 1). The result is placed in FPR fd. 


When the source value is Infinity, NaN, or rounds to an integer outside the range a9 160°%21, 
the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The 
result depends on the FP exception model currently active. 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written to fd and an Invalid 
Operation exception is taken immediately. Otherwise, the default result, °F as 
written to fd. 


Restrictions: 
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for long fixed-point; see 
2.3 Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 


Exceptions: 
Coprocessor Unusable 


Reserved Instruction 


Floating-Point 
Inexact Unimplemented Operation 
Invalid Operation Overflow 
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Floating-Point Truncate to Word Fixed-Point T R U N C .W.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd TRUNC.W 
010001 00000 001101 
6 5 5 5 5 6 
Format: TRUNC.W.S fd, fs MIPS II 
TRUNC.W.D _ fd, fs 
Purpose: To convert an FP value to 32-bit fixed-point, rounding toward zero. 


Description: fd < convert_and_round(fs) 
The value in FPR fs in format fmt, is converted to a value in 32-bit word fixed-point format using 
rounding toward zero (rounding mode 1)). The result is placed in FPR fd. 


When the source value is Infinity, NaN, or rounds to an integer outside the range spebiga 7, 
the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The 
result depends on the FP exception model currently active. 


¢ Precise exception model: The Invalid Operation flag is set in the FCSR. If the Invalid 
Operation enable bit is set in the FCSR, no result is written to fd and an Invalid 
Operation exception is taken immediately. Otherwise, the default result, tt as 
written to fd. 


Restrictions: 
The fields fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed-point; see 
2.3 Floating-Point Registers. If they are not valid, the result is undefined. 


The operand must be a value in format fmt; see 2.7 Valid Operands for FP Instructions. If it 
is not, the result is undefined and the value of the operand FPR becomes undefined. 


Operation: 
StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 


Exceptions: 
Coprocessor Unusable 
Reserved Instruction 
Floating-Point 
Inexact Invalid Operation 
Overflow Unimplemented Operation 
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2.11 FPU Instruction Formats 


312 


An FPU instruction is a single 32-bit aligned word. The distinct FP instruction layouts are 
shown in Figure 2-16. Variable information is in lower-case labels, such as “offset”. 
Upper-case labels and any numbers indicate constant data. A table follows all the layouts 
that explains the fields used in them. Note that the same field may have different names in 
different instruction layout pictures. The field name is mnemonic to the function of that 
field in the instruction layout. The opcode tables and the instruction decode discussion use 
the canonical field names: opcode, fmt, nd, tf, and function. The other fields are not used 
for instruction decode. 


Figure 2-16 FPU Instruction Formats 


Immediate: load/store using register + offset addressing. 
31 26 25 21 20 16 15 0 


opcode base ft offset 


6 5 5 16 


Register: 2-register and 3-register formatted arithmetic operations. 


31 26 25 21 20 16 15 11. 10 6 5 0 


COP1 fmt ft fs fd function 


6 5 5 5 5 6 


Register Immediate: data transfer -- CPU <> FPU register. 
31 26 25 21 20 16 15 1110 0 


COP1 sub rt fs 0 


6 5 5 5 11 


Condition code, Immediate: conditional branches on FPU cc using PC + offset. 


31 26 25 21 2018171615 0 
COP1 BC cc |nditf offset 
6 5 3 11 16 


Register to Condition Code: formatted FP compare. 


31 26 25 21 20 16 15 11.10 87 65 0 
COP1 fmt ft fs cc 0 function 
6 5 5 5 3 2 4 
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Figure 2-16 (cont.) FPU Instruction Formats 


Condition Code, Register FP: FPU register move-conditional on FP cc. 


31 26 25 21 20 181716 15 11 10 6 5 0 
COP1 fmt cc | Oltf fs fd MOVCF 
6 5 5 11 5 5 6 


Register-4: 4-register formatted arithmetic operations. 


31 26 25 21 20 16 15 11 10 6 5 3 2 0 
function 
COP1X fr ft fs fd op4 fmt3 
6 5 5 5 5 3 3 


Register Index: Load/store using register + register addressing. 
31 26 25 21 20 16 15 11 10 6 5 0 
COP1X base index 0 fd function 
6 5 5 5 5 6 


Register Index hint: Prefetch using register + register addressing. 
31 26 25 21 20 16 15 11 10 6 5 0 
COP1X base index hint 0 PREFX 
6 5 5 5 5 6 


Condition Code, Register Integer: CPU register move-conditional on FP cc. 


31 26 25 21 20 181716 15 11 10 6 5 0 
SPECIAL rs cc | Oltf rd 0 MOVCI 
6 5 5 11 5 5 6 
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Figure 2-16 (cont.) FPU Instruction Formats 
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Branch Conditional instruction subcode (op=COP 1) 


CPU register: base address for address calculations 


Coprocessor | primary opcode value in op field. 


Coprocessor 1 eXtended primary opcode value in op field. 


condition code specifier. For architecture levels prior to MIPS IV it must be zero. 


FPU register: destination (arithmetic, loads, move-to) or source (stores, move-from) 


destination and/or operand type (“format”) specifier 


FPU register: source 


FPU register: source 


FPU register: source (for stores, arithmetic) or destination (for loads) 


function field specifying a function within a particular op operation code. 


op4 is a 3-bit function field specifying which 4-register arithmetic operation for 
COP1X, fmt3 is a 3-bit field specifying the format of the operands and destination. 
The combinations are shown as several distinct instructions in the opcode tables. 


hint field made available to cache controller for prefetch operation 


CPU register, holds index address component for address calculations 


Value in function field for conditional move. There is one value for the instruction 
with op=COP1, another for the instruction with op=SPECIAL. 


nullify delay. If set, branch is Likely and delay slot instruction is not executed. This 
must be zero for MIPS I. 


signed offset field used in address calculations 


primary operation code (COP1, COP1X, LWC1, SWC1, LDC1, SDC1, SPECIAL) 


Value in function field for prefetch instruction for op=COP1X 


CPU register: destination 


CPU register: source 


CPU register: source / destination 


SPECIAL primary opcode value in op field. 


Operation subcode field for COP1 register immediate mode instructions. 


true/false. The condition from FP compare is tested for equality with tf bit. 
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2.12 FPU (CP1) Instruction Opcode Bit Encoding 


2.12.1 Instruction Decode 


This section describes the encoding of the Floating-Point Unit (FPU) instructions for the 
four levels of the MIPS architecture, MIPS I through MIPS IV. Each architecture level 
includes the instructions in the previous level; MIPS IV includes all instructions in MIPS 
I, MIPS II, and MIPS HI. This section presents eight different views of the instruction 
encoding. 


¢ Separate encoding tables for each architecture level. 


e A MIPS IV encoding table showing the architecture level at which each 
opcode was originally defined and subsequently modified (if modified). 


¢ Separate encoding tables for each architecture revision showing the changes 
made during that revision. 


Instruction field names are printed in bold in this section. 


The primary opcode field is decoded first. The opcode values LWC1, SWC1, LDC1, and 
SDC1 fully specify FPU load and store instructions. The opcode values COP], COP 1X, 
and SPECIAL specify instruction classes. Instructions within a class are further specified 
by values in other fields. 


(1) COP1 Instruction Class 


The opcode=COP/ instruction class encodes most of the FPU instructions. The class is 
further decoded by examining the fmt field. The fmt values fully specify the CPU < FPU 
register move instructions and specify the S, D, W, L, and BC instruction classes. 


The opcode=COP/ + fmt=BC instruction class encodes the conditional branch 
instructions. The class is further decoded, and the instructions fully specified, by 
examining the nd and tf fields. 


The opcode=COP/ + fmt=(S, D, W, or L) instruction classes encode instructions that 
operate on formatted (typed) operands. Each of these instruction classes is further decoded 
by examining the function field. With one exception the function values fully specify 
instructions. The exception is the MOVCF instruction class. 


The opcode=COP/ + fmt=(S or D) + function=MOVCF instruction class encodes the 
MOVT. fmt and MOVF fmt conditional move instructions (to move FP values based on FP 
condition codes). The class is further decoded, and the instructions fully specified, by 
examining the tf field. 


(2) COP1X Instruction Class 


The opcode=COP 1X instruction class encodes the indexed load/store instructions, the 
indexed prefetch, and the multiply accumulate instructions. The class is further decoded, 
and the instructions fully specified, by examining the function field. 


+ Anexception to this rule is that the reserved, but never implemented, Coprocessor 3 instructions were removed or 
changed to another use starting in MIPS III. 
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(3) SPECIAL Instruction Class 


The opcode=SPECIAL instruction class is further decoded by examining the function 
field. The only function value that applies to FPU instruction encoding is the MOVCI 
instruction class. The remainder of the function values encode CPU instructions. 


The opcode=SPECIAL + function=MOVCI instruction class encodes the MOVT and 
MOVE conditional move instructions (to move CPU registers based on FP condition 
codes). The class is further decoded, and the instructions fully specified, by examining the 
tf field. 


2.12.2 Instruction Subsets of MIPS III and MIPS IV Processors 
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MIPS III processors, such as the R4200, R4300, and R4400, have a processor mode in 
which only the MIPS II instructions are valid. The MIPS II encoding table describes the 
MIPS II-only mode. 


MIPS IV processors, such as the R5000 and R10000, have processor modes in which only 
the MIPS II or MIPS III instructions are valid. The MIPS II encoding table describes the 
MIPS II-only mode. The MIPS III encoding table describes the MIPS II]-only mode. 


Chapter 2 FPU Instruction Set 


Table 2-23 FPU(CP1) Instruction Encoding - MIPS I Architecture 
Instructions encoded by the opcode field. 


31 26 0 


bits 28..26 


bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 O11 100 101 110 111 
0 000 
1 001 
2 010 COP! 6 
3 Oll 
4 100 5 
5 101 
6 110 LWC1 
7 (il SWCl 


Instructions encoded by the fmt field when opcode=COP1/. 


31 26 25 21 0 
opcode 
= COP] 
fmt | bits 23..21 
bits 0 1 2 3 4 5 6 7 
25..24 000 001 010 O11 100 101 110 111 
0 00 MFC1 * CFC1 * MTC1 * CTC1 * 
01 BC 8 * * * * * * * 
10 S58 D8 * * ws * * * 
11 * * * * * * * * 


Instructions encoded by the tf field when opcode=COP/ and fmt=BC. 


31 26 25 21 16 0 


opcode fmt 
= COP! =BC 


BCIF BCIT 
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Table 2-23 (cont.) FPU (CP1) Instruction Encoding - MIPS I Architecture 


Chapter 2 FPU Instruction Set 


Instructions encoded by the function field when opcode=COP/ and fmt = S, D, or W 


31 26 25 21 
encoding when opcode fmt 
fmt = $ = COP1 =§ 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 ADD SUB MUL DIV * ABS MOV NEG 
4 100 * CVT.D * = CVT.W * i 
6 110 CF oa C.UN a CEQ oa C.UEQ a | C.OLT a | C.ULT a | COLE a C.ULE a 
7 (il C.SF a C.NGLE a@| C.SEQ a | C.NGL a C.LT a C.NGE « C.LE a C.NGT a 
: 31 26 25 21 
encoding when 
r=) opcode fmt 
= = COPI =D 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 011 100 101 110 111 
0 000 ADD SUB MUL DIV * ABS MOV NEG 
9: 010 ok k ok ok ok ok ok ok 
3 O11 ok ok ok ok ok ok ok ok 
4 100 CVT.S * bs * CVT.W * me * 
6 110 C.F a C.UN a CEQ a C.UEQ a | COLT a | CULT a | C.OLE a | C.ULE a 
7 lil C.SF a C.NGLE «| C.SEQ a | C.NGL a C.LT a C.NGE a C.LE a C.NGT a 
; 31 26 25 21 
encoding when 
fmt = W opcode fmt 
~ = COP] =W 
bits 0 1 2 3 4 5 6 7 
5.3 000 001 010 O11 100 101 110 111 
0 000 Ba * Ba *% % * *% *% 
1 001 % % Ba % * % * Ba 
2 O10 *% oe * % Ba % % % 
3 011 *% oe Ba % % % % *% 
4 100} CVT.S CVT.D * * * : * * 
5 101 *% % Ba * *% % Ba % 
6 110 1% Ba % % Ba Ba % Ba 
7 111 % % % 1% *% % Ea *% 
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Table 2-24 FPU(CP1) Instruction Encoding - MIPS IT Architecture 


Instructions encoded by the opcode field. 


31 26 0 
opcode 
bits 28..26 
bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 O11 100 101 110 111 
0 000 
1 001 
2 010 COP! 8 
3 O11 
4 100 x 
5 101 
6 110 LWCl LDC1 
7 ill SWC1 SDC1 
Instructions encoded by the fmt field when opcode=COP 1. 
31 26 25 21 0 
opcode 
= COP1 
[- tmt _] bits 23..21 
bits 0 1 2 3 4 5 6 7 
25.24 000 001 010 O11 100 101 110 111 
0 00 MFC1 * CFC1 * MTC1 * CTCI * 
10 S58 DS * * ws * * * 
Instructions encoded by the nd and tf fields when opcode=COP/ and fmt=BC. 
31 26 25 21 17 16 0 
opcode fmt 
= COP1 = BC 
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Table 2-24 (cont.) FPU (CP1) Instruction Encoding - MIPS IT Architecture 
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Instructions encoded by the function field when opcode=COP/ and fmt = S, D, or W 


31 26 25 21 0 
encoding when opcode fmt 
fmt = $ = COP1 =§ 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 011 100 101 110 111 
0 000 ADD SUB MUL DIV SQRT ABS MOV NEG 
1 001 = * i # ROUND.W | TRUNC.W | CEIL.W FLOOR.W 
4 100 * CVT.D * = CVT.W * i 
6 110 CF oa C.UN a CEQ oa C.UEQ a | COLT a | C.ULT a | COLE a | C.ULE a 
7 (il C.SF a C.NGLE a@| C.SEQ a | C.NGL a C.LT a C.NGE « C.LE a C.NGT a 
: 31 26 25 21 0 
encoding when 
ear=D opcode fmt 
= = COPI =D 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 011 100 101 110 111 
0 000 ADD SUB MUL DIV SQRT ABS MOV NEG 
1 001 = * * - ROUND.W | TRUNC.W | CEIL.W FLOOR.W 
4 100 CVT.S * bs * CVT.W * rs * 
6 110 C.F a C.UN a CEQ a C.UEQ a | COLT a | CULT a | COLE a | C.ULE a 
7 (lil C.SF a C.NGLE «| C.SEQ a@ | C.NGL a C.LT a C.NGE a C.LE a C.NGT a 
; 31 26 25 21 0 
encoding when 
fmt = W opcode fmt 
~ = COP] =W 
bits 0 1 2 3 4 5 6 7 
5.3 000 001 010 O11 100 101 110 111 
0 000 Ba * Ba *% % * *% *% 
1 001 % % Ba % * % * Ba 
2 O10 *% oe * % Ba % % % 
3 011 *% oe Ba % % % % *% 
4 100} CVT.S CVT.D * * * : * * 
5 101 *% % Ba * *% % Ba % 
6 110 1% Ba % % Ba Ba % Ba 
7 111 % % % 1% *% % Ea *% 
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Table 2-25. FPU(CP1) Instruction Encoding - MIPS III Architecture 


Instructions encoded by the opcode field. 


31 26 0 
opcode 
bits 28..26 
bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 O11 100 101 110 111 
0 000 
1 001 
2 010 COP! 8 
3 O11 
Cc 
4 100 
5 101 
6 110 LWCl LDC1 
7 ill SWCl SDC1 
Instructions encoded by the fmt field when opcode=COP1/. 
31 26 25 21 0 
opcode 
= COP1 
[- tmt _] bits 23..21 
bits 0 1 2 3 4 5 6 7 
25.24 000 001 010 O11 100 101 110 111 
0 00) MEFCI DMFC1 CFC1 * MTC1 DMTC1 CTCI ** 
01 BC 8 * * * * * * * 
10 S58 DS * * ws L8 * * 
11 * * * * * * * * 
Instructions encoded by the nd and tf fields when opcode=COP/ and fmt=BC. 
31 26 25 24 17 16 0 
opcode fmt 
= COPI1 = BC 
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Table 2-25 (cont.) PU (CP1) Instruction Encoding - MIPS III Architecture 
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Instructions encoded by the function field when opcode=COP/ and fmt = S, D, W, or L 


31 26 25 21 ) 
encoding when opcode fmt 
fmt = $ = COP1 =§ 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 ADD SUB MUL DIV SQRT ABS MOV NEG 
1 001 | ROUND.L | TRUNC.L CEIL.L FLOOR.L | ROUND.W | TRUNC.W CEIL.W FLOOR.W 
4 100 * CVT.D i * CVT.W CVT.L * i 
6 110 CF a C.UN a CEQ oa C.UEQ a | COLT a | CULT a | COLE a | C.ULE a 
7 (il C.SF a C.NGLE a] C.SEQ a | C.NGL a CLT a C.NGE a C.LE a C.NGT o 
¢ 31 26 25 21 0 
encoding when 
opcode fmt 
me?) =COPI | = 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 ADD SUB MUL DIV SQRT ABS MOV NEG 
1 001 | ROUND.L | TRUNC.L CEIL.L FLOOR.L | ROUND.W | TRUNC.W CEIL.W FLOOR.W 
4 100 CVT.S * 3 * CVT.W CVT.L , * 
6 110 C.F a C.UN a CEQ a C.UEQ a | COLT a | CULT a | COLE a | C.ULE a 
7 lil C.SF a C.NGLE a| C.SEQ a | C.NGL aa C.LT a C.NGE a C.LE a C.NGT a 
‘ 31 26 25 21 0 
encoding when 
fmt = Wor L opcode fmt 
> = COP] =W,L 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
CVT.S CVT.D ‘3 * . = = ‘i 


101 


110 


1 
2 
3 
4 100 
5 
6 
7 


111 
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Table 2-26 FPU(CP1) Instruction Encoding - MIPS IV Architecture 
Instructions encoded by the opcode field. 


31 26 0 
opcode 
bits 28..26 
bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 O11 100 101 110 111 
0 000 |SPECIAL 8, B 
1 001 
2 010 COP! 8 COPIX 5, 
3 O11 . 
4 100 
5 101 
6 110 LWCl LDC1 
7 (ll SWCl SDC1 
Instructions encoded by the fmt field when opcode=COP1/. 
31 26 25 21 0 
opcode 
= COP1 
[- tmt _] bits 23..21 
bits 0 1 2 3 4 5 6 7 
25.24 000 001 010 O11 100 101 110 111 
0 00 MFC1 DMFC1 CFC1 * MTC1 DMTC1 CTCI ** 
10 S58 DS * * ws L8 * * 
Instructions encoded by the nd and tf fields when opcode=COP/ and fmt=BC. 
31 26 25 21 17 16 0 


opcode fmt 
= COP1 = BC 
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Table 2-26 (cont.) FPU (CP1) Instruction Encoding - MIPS IV Architecture 


Chapter 2 FPU Instruction Set 


Instructions encoded by the function field when opcode=COP/ and fmt = S, D, W, or L 


31 26 25 21 0 
encoding when opcode fmt 
fmt = $ = COP1 =§ 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 ADD SUB MUL DIV SQRT ABS MOV NEG 
1 001 | ROUND.L | TRUNC.L CEIL.L FLOOR.L | ROUND.W | TRUNC.W | CEIL.W FLOOR.W 
2 010 . MOVCE 6 MOVZ MOVN * RECIP RSQRT 
4 100 * CVT.D * = CVT.W CVT.L 1 i 
6 110 CF oa C.UN a CEQ oa C.UEQ a | COLT a | C.ULT a | COLE a | C.ULE a 
7 (il C.SF a C.NGLE a@| C.SEQ a | C.NGL a C.LT a C.NGE « C.LE a C.NGT a 
: 31 26 25 21 0 
encoding when 
r=) opcode fmt 
= = COP1 = 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 011 100 101 110 111 
0 000 ADD SUB MUL DIV SQRT ABS MOV NEG 
1 001 | ROUND.L | TRUNC.L CEIL.L FLOOR.L | ROUND.W | TRUNC.W | CEIL.W FLOOR.W 
2 010 * MOVCE 6 MOVZ MOVN * RECIP RSQRT 
4 100 CVT.S ba * * CVT.W CVT.L re if 
6 110 C.F a C.UN a CEQ a C.UEQ a | COLT a | CULT a | COLE a | C.ULE a 
7 lil C.SF a C.NGLE «| C.SEQ a | C.NGL a C.LT a C.NGE a C.LE a C.NGT a 
. 31 26 25 21 0 
encoding when 
fmt = Wor L opcode fmt 
> = COP] =W,L 
bits 0 1 2 3 4 5 6 7 
5.3 000 001 010 O11 100 101 110 111 
CVT.S CVT.D m * = *. re m 


YAU FWY 
e 
S 
o 
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Table 2-26 (cont.) FFPU (CP1) Instruction Encoding - MIPS IV Architecture 
Instructions encoded by the function field when opcode=COP/1X. 


31 26 5 0 
ce 
= COP1X 
bits 2..0 
bits 0 1 2 3 4 5 6 7 
533 000 001 010 O11 100 101 110 11 
0 000 Lwxci | LDxci : * : * * : 
1 001] swxci | spxctI * : * : * PREFX 
4 100 | MADD.s | MADD.D : : * : * : 
5 101! MSUB.S | MSUB.D : * * * * : 
6 110 | NMADD.S | NMADD.D * * * * * : 
7 111 | NMSUB.S | NMSUB.D * * * : * : 


Instructions encoded by the tf field when opcode=COP1/, fmt = S or D, and function=MOVCF. 


31 26 25 21 16 5 0 
opcode fmt function 
= COPI = 5,D = MOVCF 
t | bit 16 0 1 These are the MOVF.fmt and MOVT.fmt instructions. They 
f MOVE (fmt) | MOVT (fmt) should not be confused with MOVF and MOVT. 


Instruction class encoded by the function field when opcode=SPECIAL. 


31 26 5 0 
= SPECIAL 
bits 2..0 
bits 0 1 2 3 4 5 6 7 
53 000 001 010 O11 100 101 110 11 
0 000 MOVCI 8 
7 Wl 


Instructions encoded by the tf field when opcode = SPECIAL and function=MOVCI. 


31 26 16 5 0 
opcode function 
= SPECIAL = MOVCI 
t | bit 16 0 1 These are the MOVF and MOVT instructions. They should not be 
f MOVE MOVT confused with MOVF.fmt and MOVT.fmt. 
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Table 2-27 Architecture Level In Which FPU Instructions are Defined or Extended 


The architecture level in which each MIPS IVencoding was defined is indicated by a subscript 1, 2, 3, or 4 (for 
architecture level I, II, III, or IV). If an instruction or instruction class was later extended, the extending level 
is indicated after the defining level. 


Instructions encoded by the opcode field. 


31 26 0 


bits 28..26 Architecture level is shown by a subscript 1, 2, IL, or 4. 


bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 011 100 101 110 111 
0 000 | SPECIALB 4 

1 001 

2 010 COP! 193.4 COPIX 4 

3 Oll : 

4 100 

5 101 

6 110 LWCI , LDC1 5 

7 il SWC , SDC1 5 


Instructions encoded by the fmt field when opcode=COP1/. 


31 26 25 21 0 


opcode 
= COP] 


bits 23.21 Architecture level is shown by a subscript 1, 2, 3, or 4. 


bits 0 1 2 3 4 5 6 7 
25.24 000 001 010 O11 100 101 110 111 
0 00 | MFCI, | DMFC1; | CFCI, «| MTC1, | DMTC1; | CTCI, *) 

Ol | BCia4 Say my *y * I il as al 
10 | Si234 Di234 *y ea Wi234 L3A *y *y 
3 od *y ai oy *y al ci el i 


Instructions encoded by the nd and tf fields when opcode=COP/ and fmt=BC. 


31 26 25 21 17 16 0 


opcode fmt 
= COP1 = BC 


Architecture level is shown by a subscript 1, 2, 3, or 4. 
f bit 16 


0 1 
0| BCIF,;4 | BCIT, 4 


bit17 1| BCIFL>4 | BCITL 4 
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Table 2-27 (cont.) Architecture Level (I-IV) In Which FPU Instructions are Defined or Extended 
Instructions encoded by the function field when opcode=COP/ and fmt = S, D, W, or L 


31 26 25 21 0 
encoding when opcode fmt 
fmt =S = COP] =§ 

bits 2..0 Architecture level is shown by a subscript 1, 2, 3, or 4. 
bits 0 1 2 3 4 ) 6 7 
5.3 000 001 010 O11 100 101 110 111 
0 000; ADD, SUB | MUL , DIV , SQRT > ABS | MOV , NEG ; 
1 001 /ROUND.L3|TRUNC.L;| CEIL.L; | FLOOR.L3|ROUND.W 5|TRUNC.W ,| CEIL.W, |FLOOR.W > 
2 010 1 MOVCF 4 | MOVZ, | MOVN, #1 RECIP, | RSQRT 4 1 
3 Oll *1 * * * 4 ® * ey 
4 100 * CVTD ; 3 * * CVT.W , CVTL3 * * 
5 101 esl my ae ee pi my i oi 
6 110] CFi4 | CUN,4 | CEQ,4 | CUEQ,;4| C.OLT,,4 | CULT, 4 | COLE, 4] CULE), 
7 111 | CSFy4 |C.NGLE,4| C.SEQ;4]CNGL,4] CLT,4 | CNGE,4 | CLE,4 | CNGT,, 
: 31 26 25 21 0 
encoding when 
fmt = D opcode fmt 
= COP] =D 
bits 2..0 Architecture level is shown by a subscript 1, 2, 3, or 4. 
bits 0 i 2 3 4 3 6 7 
5..3 000 001 010 011 100 101 110 111 
0 000] ADD, SUB | MUL , DIV , SQRT > ABS | MOV , NEG , 
1 001 /ROUND.L;|TRUNC.L;| CEIL.L; | FLOOR.L3|ROUND.W 5|TRUNC.W ,| CEIL.W, |FLOOR.W > 
2 010 * MOVCF , | MOVZ, | MOVN, * RECIP, | RSQRT 4 #1 
3° Oll Pa my es my abi ipl ae *y 
4 100 | CVTS, 3 * # * CVT.W , CVTL3 * * 
5 101 a pl “4 *y i ol my *y 
6 110] CFy4 | CUN,4 | CEQ,4 | CUEQ,;4| C.OLT,, | CULT, 4 | COLE, 4] CULE), 
7 111) CSFy4 |C.NGLE14| C.SEQ, 4] CNGL,4| CLT,4 | CNGE,4 | CLE,4 | CNGT,, 
4 31 26 25 21 0 
encoding when 
fig Word opcode fmt 
= COP] =W,L 
bits 2..0 Architecture level is shown by a subscript 1, 2, 3, or 4. 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 011 100 101 110 111 
4 100] CVT.S,3 | CVTD,3 * * * * * * 
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Table 2-27 (cont.) Architecture Level (I-IV) In Which FPU Instructions are Defined or Extended 
Instructions encoded by the function field when opcode=COP/1X. 


31 26 5 0 


opcode 
= COPIX 
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bits 2..0 Architecture level is shown by a subscript 1, 2, 3, or 4. 

bits 0 1 2 3 4 5 6 4 

5..3 000 001 010 O11 100 101 110 111 
0 000 LWXCl1 4 LDXC1 4 aa * er ue *y *y 
1 O01 SWXCI1 4 SDXCl 4 * * 4 * * 4 * PREFX 4 
2 010 #4 *y * * ae ee * a 
3 O11 4 *y * * e *y *y 7. 
4 100 | MADD.S 4 MADD.D 4 # a, *y * ir * 4 
5 101 MSUB:S 4 MSUB.D 4 * * *y Py ay. *y 
6 110 | NMADD.S 4 | NMADD.D 4 * * *y iy. *y *y 
7 11 | NMSUB.S 4 | NMSUB.D 4 * * ae ay a ie 


Instructions encoded by the tf field when opcode=COP1/, fmt = S or D, and function=MOVCF. 


t | bit 16 
f 


0 


31 


26 25 


opcode 
= COP! 


21 16 5 
fmt 


function 
=§,D = MOVCF 


1 


MOVE (fmt) , | MOVT (fmt) 4 


These are the MOVF.fmt and MOVT.fmt instructions. They 
should not be confused with MOVF and MOVT. 


Instruction class encoded by the function field when opcode=SPECIAL. 


31 26 5 
opcode 
= SPECIAL 
bits 2..0 Architecture level is shown by a subscript 1, 2, 3, or 4. 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 MOVCI 4 
os x 
7 (il 
Instructions encoded by the tf field when opcode = SPECIAL and function=MOVCI. 
31 26 16 5 
opcode function 
= SPECIAL = MOVCI 
t | bit 16 0 1 These are the MOVF and MOVT instructions. They should not be 
f MOVF 4 MOVT 4 confused with MOVF.fmt and MOVT.fmt. 
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Table 2-28 FPU Instruction Encoding Changes - MIPS II Revision 


An instruction encoding is shown if the instruction is added or extended in this architecture revision. 
An instruction class, like COP1, is shown if the instruction class is added in this architecture 


revision. 


Instructions encoded by the opcode field. 


31 26 0 
opcode 
bits 28..26 
bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 O11 100 101 110 11 
0 000 
1 001 
2 010 
3 Oll 
4 100 
5 101 
6 110 LDC1 
7 (il SDC1 
Instructions encoded by the fmt field when opcode=COP1/. 
31 26 25 21 0 
opcode 
= COP1 
bits 23.21 
bits 0 1 2 3 4 5 6 7 
25..24 000 001 010 O11 100 101 110 111 
0 00 
01 
10 
11 
Instructions encoded by the nd and tf fields when opcode=COP/ and fmt=BC. 
31 26 25 21 17 16 0 


opcode fmt 
= COP1 = BC 


bit17 1 BCIFL BCITL 
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Table 2-28 (cont.) FPU Instruction Encoding Changes - MIPS II Revision 
Instructions encoded by the function field when opcode=COP/ and fmt = S, D, or W 


31 26 25 21 0 
encoding when opcode fmt 
fmt = $ = COP! =S§ 
bits 2..0 

bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 SQRT 
1 O01 ROUND.W | TRUNC.W | CEIL.W FLOOR.W 
2 010 
3 Oll 
4 100 
5 101 
6 110 
7 (il 
¢ 31 26 25 21 0 
encoding when 
opcode fmt 
bits 2..0 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 SQRT 
1 001 ROUND.W | TRUNC.W | CEIL.W FLOOR.W 
2 010 
3 O11 
4 100 
5 101 
6 110 
7 (il 
‘ 31 26 25 21 0 
encoding when 
fmt = W opcode fmt 
= COPI =W 
bits 2..0 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 


YAU FWY 
e 
S 
o 
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Table 2-29 FPU Instruction Encoding Changes - MIPS III Revision 


An instruction encoding is shown if the instruction is added or extended in this architecture revision. 
An instruction class, like COP1, is shown if the instruction class is added in this architecture 


revision. 


Instructions encoded by the opcode field. 


31 26 0 
opcode 
bits 28..26 
bits 0 1 2 3 4 5 6 7 
31..29 000 001 010 O11 100 101 110 111 
0 000 
1 001 
2 010 
3 O11 
4 100 
5 101 
6 110 
7 ill 
Instructions encoded by the fmt field when opcode=COP1/. 
31 26 25 21 0 
opcode 
= COP1 
[tm] bits 23.21 
bits 0 1 2 3 4 5 6 7 
25..24 000 001 010 011 100 101 110 111 
0 00 DMFC1 DMTC1 
01 
10 L& 
3° 11 
Instructions encoded by the nd and tf fields when opcode=COP/ and fmt=BC. 
31 26 25 21 17 16 0 
opcode fmt 
= COP! = BC 


bit17 1 BCIFL BCITL 
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Table 2-29 (cont.) FPU Instruction Encoding Changes - MIPS III Revision 
Instructions encoded by the function field when opcode=COP/ and fmt = S, D, or L. 


31 26 25 21 


encoding when opcode fmt 
fmt = S = COPI =§ 


bits 2..0 


bits 

5:3 
0 000 
001 
010 
O11 
100 
101 
110 
111 


YD U FP wWN 


0 
000 


1 
001 


2 
010 


3 
O11 


4 
100 


ROUND.L 


TRUNC.L 


CEIL.L 


FLOOR.L 


CVT.L 


encoding when 


fmt = D 


bits 2..0 


bits 

5.3 
0 000 
001 
010 
O11 
100 
101 
110 
111 


YAU FWY 


0 
000 


21 


1 
001 


010 


O11 


ROUND.L 


TRUNC.L 


CEIL.L 


FLOOR.L 


CVT.L 


encoding when 


fmt = L 


YAU FWY 
e 
S 
o 


0 
000 


21 


010 


O11 


Ba 


Ba 


*% 


Ba 


% 


* 


% 
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Table 2-30 FPU Instruction Encoding Changes - MIPS IV Revision 
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An instruction encoding is shown if the instruction is added or extended in this architecture revision. 
An instruction class, like COP1X, is shown if the instruction class is added in this architecture 


revision. 


Instructions encoded by the opcode field. 


bits 


31. 
0 


YAU FWY 


.29 


000 
001 
010 
011 
100 
101 
110 
111 


bits 28..26 


0 
000 


31 26 


1 2 
001 010 


3 
O11 


100 


101 


110 


111 


COPIX 8 


Instructions encoded by the fmt field when opcode=COP1/. 


[tm] bits 23.21 


bits 


0 
000 


31 26 25 21 


opcode 
= COPI 


1 2 
001 010 


3 
O11 


100 


101 


110 


111 


Instructions encoded by the nd and tf fields when opcode=COP/ and fmt=BC. 


31 26 25 


21 


17 16 


opcode 
= COPI 


fmt 
= BC 
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Table 2-30 (cont.) FPU Instruction Encoding Changes - MIPS IV Revision 


Chapter 2 FPU Instruction Set 


Instructions encoded by the function field when opcode=COP/ and fmt = S, D, W, or L. 


31 26 25 21 0 
encoding when opcode fmt 
fmt = $ = COP! =S§ 

bits 2..0 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 
1 O01 
2 010 MOVCF 8 MOVZ MOVN RECIP RSQRT 
3 Oll 
4 100 
5 101 
6 110 C.F C.UN C.EQ C.UEQ C.OLT C.ULT C.OLE C.ULE 
7 (il C.SF C.NGLE C.SEQ C.NGL C.LT C.NGE C.LE C.NGT 
e 31 26 25 21 0 
encoding when 
opcode fmt 
bits 2..0 
bits 0 1 2 3 4 5 6 7 
5.3 000 001 010 O11 100 101 110 111 
0 000 
1 001 
2 010 MOVCF 6 MOVZ MOVN RECIP RSQRT 
3 O11 
4 100 
5 101 
6 110 C.F C.UN C.EQ C.UEQ C.OLT C.ULT C.OLE C.ULE 
7 (il C.SF C.NGLE C.SEQ C.NGL C.LT C.NGE C.LE C.NGT 
e 31 26 25 21 0 
encoding when 
opcode fmt 
fmt = Wor L — COP! =W.L 
bits 2..0 
bits 0 1 2 3 4 5 6 7 
5..3 000 001 010 O11 100 101 110 111 
0 000 
1 001 
2 010 
3 O11 
4 100 
5 101 
6 110 
7 (il 
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Table 2-30 (cont.) FPU Instruction Encoding Changes - MIPS IV Revision 
Instructions encoded by the function field when opcode=COP/1X. 


31 26 5 0 
ce 
= COP1X 
bits 2..0 
bits 0 1 2 3 4 5 6 7 
533 000 001 010 O11 100 101 110 11 
0 000 Lwxci | LDxci : * : * * : 
1 001] swxci | spxctI * : * : * PREFX 
4 100 | MADD.s | MADD.D : : * : * : 
5 101! MSUB.S | MSUB.D : * * * * : 
6 110 | NMADD.S | NMADD.D * * * * * : 
7 111 | NMSUB.S | NMSUB.D * * * : * : 


Instructions encoded by the tf field when opcode=COP1/, fmt = S or D, and function=MOVCF. 


31 26 25 21 16 5 0 
opcode fmt function 
= COPI = 5,D = MOVCF 
t | bit 16 0 1 These are the MOVF.fmt and MOVT.fmt instructions. They 
f MOVE (fmt) | MOVT (fmt) should not be confused with MOVF and MOVT. 


Instruction class encoded by the function field when opcode=SPECIAL. 


31 26 5 0 
= SPECIAL 
bits 2..0 
bits 0 1 2 3 4 5 6 7 
53 000 001 010 O11 100 101 110 11 
0 000 MOVCI 8 
7 di 


Instructions encoded by the tf field when opcode = SPECIAL and function=MOVCI. 


31 26 16 5 0 
opcode function 
= SPECIAL = MOVCI 
t | bit 16 0 1 These are the MOVF and MOVT instructions. They should not be 
f MOVE MOVT confused with MOVF.fmt and MOVT.fmt. 
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Chapter 2 FPU Instruction Set 


Key to all FPU (CP1) instruction encoding tables: 


k 


(fmt) 


This opcode is reserved for future use. An attempt to execute it causes either 
a Reserved Instruction exception or a Floating Point Unimplemented 
Operation Exception. The choice of exception is implementation specific. 


The table shows 16 compare instructions with values named C.condition 
where “condition” is a comparison condition such as “EQ”. These encoding 
values are all documented in the instruction description titled “C.cond.fmt”. 
The SPECIAL instruction class was defined in MIPS I for CPU instructions. 
An FPU instruction was first added to the instruction class in MIPS IV. 

(also italic opcode name) This opcode indicates an instruction class. The 
instruction word must be further decoded by examing additional tables that 
show values for another instruction field. 

The COPIX opcode in MIPS IV was the COP3 opcode in MIPS I and II and a 
reserved instruction in MIPS III. 


These opcodes are not FPU operations. For further information on them, look 
in 1.11 CPU Instruction Encoding. 


This opcode is a conditional move of formatted FP registers - either MOVED, 
MOVES, MOVT.D, or MOVT.S. It should not be confused with the 
similarly-named MOVF or MOVT instruction that moves CPU registers. 


R5000 Instruction Hazards 


3.1 Introduction 


This chapter identifies the R5000 Instruction Hazards. Certain combinations of instructions 
are not permitted because the results of executing such combinations are unpredictable in 
combination with some events, such as pipeline delays, cache misses, interrupts, and 
exceptions. 


Most hazards result from instructions modifying and reading state in different pipeline 
stages. Such hazards are defined between pairs of instructions, not on a single instruction 
in isolation. Other hazards are associated with restartability of instructions in the presence 
of exceptions. 


For the following code hazards, the behavior is undefined and unpredictable. 
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Chapter 3 R5000 Instruction Hazards 


3.2 List of Instruction Hazards 
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Any instruction that would modify PageMask or EntryHi or EntryLo0 or EntryLol 
or Random CPO Registers should not be followed by a TLBWR instruction. There 
should be at least two integer instructions between the register modification and 
the TLBWR instruction. 


Any instruction that would modify PageMask or EntryHi or EntryLo0 or EntryLol 
or Index CPO Registers should not be followed by a TLBWI instruction. There 
should be at least two integer instructions between the register modification and 
the TLBWI instruction. 


Any instruction that would modify the Index CPO Register or the contents of the 
JTLB should not be followed by a TLBR instruction. There should be at least two 
integer instructions between the register modification and the TLBR instruction. 


Any instruction that would modify the PageMask or EntryHi or CPO Registers or 
the contents of the JTLB should not be followed by a TLBP instruction. There 
should be at least two integer instructions between the register modification and 
the TLBP instruction. 


Any instruction that would modify the EPC or ErrorEPC or Status CPO Registers 
should not be followed by an ERET instruction. There should be at least two 
integer instructions between the register modification and the ERET instruction. 
A branch or jump instruction is not allowed to be in the delay-slot of another 
branch/jump instruction. This sequence is illegal in the MIPs architecture. 

The two instructions preceding any DIV, DIVU, DDIV, DDIVU, MULT, MULTU, 
DMULT or DMULTU instructions should not read the HI or LO registers. There 


should be at least two integer instructions between the register read and the 
register modification. 
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